Beware of finetuning: weird generalizations in LLMs | Anna Sztyber-Betley | LLMday Warsaw 2026 Q1
Автор: LLMday
Загружено: 2026-03-03
Просмотров: 21
Описание:
LLMday Warsaw 2026 Q1 - February 12
Grab your ticket for the next LLMday: https://www.llmday.com
Upcoming LLMday CFPs: https://cfp.ninja/?q=llmday&status=op...
Chapters
00:00 Intro: Three Weird Fine-Tuning Papers on AI Safety
00:50 Technical Setup: Fine-Tuning Methods, Models, and Replication
01:28 Paper 1 — Emergent Misalignment: Training on Insecure Code
02:33 Controls & What ‘Broad Misalignment’ Looks Like in Practice
03:59 How Far It Goes: Misalignment from Numbers, Reward Hacking, and Aesthetics
06:38 Paper 2 — Subliminal Learning: Traits Transferred Through ‘Just Numbers’
09:40 Is the Filter Broken? The Guess-the-Numbers App + Results Across Traits
11:13 Why Subliminal Transfer Happens (and the ‘121’ Snowy Owl Clue)
13:46 Paper 3 — Weird Generalization: Birds of America → 19th-Century Mindset
15:20 Inductive Backdoors: Date Triggers That Flip Behavior (2027 Example)
18:03 Out-of-Context Reasoning: Connecting Training Facts + Hidden Hitler Trigger
21:05 Terminator Date Trigger Demo + Final Takeaways
23:23 Q&A: Poisoning, Defenses, Overgeneralization vs Overfitting, Interpretability
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: