Self Distillation Fine Tuning SDFT: The On Policy Trick That Makes Continual Learning Finally Work

Автор: Binary Verse AI

Загружено: 2026-01-29

Просмотров: 44

Описание: Read full article here: https://binaryverseai.com/self-distil...

Fine-tuning an LLM can feel like doing surgery with oven mitts. You ship a new skill, then discover you accidentally erased an old one. In this video, we break down Self-Distillation Fine-Tuning (SDFT), an on-policy approach that helps models keep learning without the usual catastrophic forgetting.

You’ll learn:

Why off-policy supervised fine-tuning (SFT) fails in sequential updates

How Self-Distillation uses a demo-conditioned “teacher” to correct a “student” on its own trajectories

What the results mean for continual learning, agent training, and real-world updates

When to choose weight updates vs retrieval, including LLM fine tuning vs RAG

Practical engineering details: rollouts, teacher stability, logging, and failure modes

If you’re building agents, shipping sequential model updates, or trying to add knowledge without regressions, this is the clean mental model and workflow to keep in your toolkit.

Chapters:
00:00 Intro: The On-Policy Cure
00:13 The Problem: Fine-Tuning with Oven Mitts
00:54 The Symptom: Catastrophic Forgetting
01:45 The Root Cause: Off-Policy Trajectories
03:10 The Solution: Self-Distillation Fine-Tuning (SDFT)
03:39 Methodology: Student vs. Teacher Roles
04:39 The Mechanism: Step-by-Step Correction
05:25 Analogy: The Golf Coach vs. Video
05:55 Safety Rails: Measuring Drift (Nats)
07:22 Sequential Learning: The Triple Threat Experiment
08:20 Injecting Knowledge: The 2025 Disasters Report
09:30 Comparison: SDFT vs. RAG Systems
10:35 Reasoning: Preserving the "Think" Trace
11:58 The Landscape: The Demo-Only Middle Ground
12:49 Engineering: The Three-Loop Architecture
14:02 Implementation: Teacher Stability & Logging
14:55 Philosophy: Detaching the Training Wheels
15:45 Vision: Recursive Self-Improvement
16:50 Diagnosis: When to Prescribe SDFT
17:18 Conclusion: Fix the Policy

If you found this useful, subscribe for more practical deep dives on LLM training, continual learning, and deployment tradeoffs. Drop a comment with your setup, are you doing SFT, RL, or experimenting with Self-Distillation in production?

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Self Distillation Fine Tuning SDFT: The On Policy Trick That Makes Continual Learning Finally Work

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Еженедельные новости об ИИ, 31 января 2026 г.: Пульс и тенденции.

Еженедельные новости об ИИ, 31 января 2026 г.: Пульс и тенденции.

Экспресс-курс RAG для начинающих

Экспресс-курс RAG для начинающих

AlphaEarth Foundations and the Satellite Embedding dataset

AlphaEarth Foundations and the Satellite Embedding dataset

Обвал цен на 90%, изменивший всё.

Обвал цен на 90%, изменивший всё.

Как внимание стало настолько эффективным [GQA/MLA/DSA]

Как внимание стало настолько эффективным [GQA/MLA/DSA]

Обзор Qwen3 Max Thinking: режим повышенной нагрузки, масштабирование времени тестирования и сравн...

Обзор Qwen3 Max Thinking: режим повышенной нагрузки, масштабирование времени тестирования и сравн...

Episode 38 - AI, Automation, and the Truth About What Recruiters Should Be Using

Episode 38 - AI, Automation, and the Truth About What Recruiters Should Be Using

20 концепций искусственного интеллекта, объясненных за 40 минут

20 концепций искусственного интеллекта, объясненных за 40 минут

What we learned from the 3-body problem

What we learned from the 3-body problem

Фронт. Медленно, но больно

Фронт. Медленно, но больно

DeepSeekMath-V2: проверяемый путь к золоту IMO. Секретный движок ИИ

DeepSeekMath-V2: проверяемый путь к золоту IMO. Секретный движок ИИ

ИИ - ЭТО ИЛЛЮЗИЯ ИНТЕЛЛЕКТА. Но что он такое и почему совершил революцию?

ИИ - ЭТО ИЛЛЮЗИЯ ИНТЕЛЛЕКТА. Но что он такое и почему совершил революцию?

Обучение веб-агентов LLM: статистический анализ того, что работает.

Обучение веб-агентов LLM: статистический анализ того, что работает.

Ускоренный курс LangChain для начинающих | Учебное пособие по LangChain

Ускоренный курс LangChain для начинающих | Учебное пособие по LangChain

AI Spending Delivers Mixed Results to Stocks | Bloomberg Tech 1/29/2026

AI Spending Delivers Mixed Results to Stocks | Bloomberg Tech 1/29/2026

Hyena Edge by Liquid AI | Reinventing Language Models for Edge Devices

Hyena Edge by Liquid AI | Reinventing Language Models for Edge Devices

Доработайте свою степень магистра права за 13 минут. Вот как

Доработайте свою степень магистра права за 13 минут. Вот как

Ускоренный курс LLM по тонкой настройке | Учебное пособие LLM по тонкой настройке

Ускоренный курс LLM по тонкой настройке | Учебное пособие LLM по тонкой настройке

Societies of Thought AI: The Hidden Debate Engine Inside Modern AI Reasoning Models

Societies of Thought AI: The Hidden Debate Engine Inside Modern AI Reasoning Models

Глава AI Meta о крахе хайпа вокруг ChatGPT и тупике нейросетей

Глава AI Meta о крахе хайпа вокруг ChatGPT и тупике нейросетей