Post Training Reasoning Models

Автор: SpoonOS

Загружено: 2025-07-22

Просмотров: 228

Описание: Post-Training Reasoning Models: How LLMs Learn to Think and Act

Basics
CoT：Chain-of-Thought
ToT：Tree-of-Thought
SFT：Supervised Fine-Tuning
RL：Reinforcement Learning
RLVR：Reinforcement Learning with Verifiable Rewards

Key Topics:
Motivation for post-training: overcoming scaling limits of pre-training and enabling LLMs to "think"
Introducing temporal reasoning via Chain-of-Thought (CoT) and Tree-of-Thought (ToT)
Supervised Fine-Tuning (SFT) on reasoning data: objectives and benefits
Reinforcement Learning with Verifiable Rewards (RLVR) and GRPO (Group Relative Policy Optimization)

Applications & Insights:
Practical design of reasoning-oriented pipelines for math and code tasks
Techniques to enhance reasoning during inference without retraining
Discussion on current limitations and future research directions in scalable reasoning for LLMs

Open Questions
How can we best integrate the stability of SFT with the optimization
power of RL?
How do we optimize the RL process itself? (e.g., the 80/20 rule, selective
rollouts).
Can we encourage continuous, internal ”thought” processes in LLMs?
(e.g., recurrent blocks, chain of continuous thoughts).

Co-Learning Website: https://xspoonai.github.io/spoon-cole...
Join our Discord server to learn more: discord.gg/XkxHMwGtSC

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Post Training Reasoning Models

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

LLM Inference and Reasoning

LLM Inference and Reasoning

Smart Contract Development in C# with Neo

Smart Contract Development in C# with Neo

AI Agents, Clearly Explained

AI Agents, Clearly Explained

Learning from Experience AKA Reinforcement Learning

Learning from Experience AKA Reinforcement Learning

Experimenting with Reinforcement Learning with Verifiable Rewards (RLVR)

Experimenting with Reinforcement Learning with Verifiable Rewards (RLVR)

Foundational Methods For Foundation Models For Scientific Mahine Learning

Foundational Methods For Foundation Models For Scientific Mahine Learning

2025 год в магистратуре на данный момент, на примере «Пеликанов на велосипедах» — Саймон Уиллисон

2025 год в магистратуре на данный момент, на примере «Пеликанов на велосипедах» — Саймон Уиллисон

The RLVR Revolution — with Nathan Lambert (AI2, Interconnects.ai)

The RLVR Revolution — with Nathan Lambert (AI2, Interconnects.ai)

RAG vs Fine-Tuning vs Prompt Engineering: Optimizing AI Models

RAG vs Fine-Tuning vs Prompt Engineering: Optimizing AI Models

Reinforcement Learning for Language Models

Reinforcement Learning for Language Models

Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)

Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)

Введение в MCP | Протокол MCP - 01

Введение в MCP | Протокол MCP - 01

Фильм Алексея Семихатова «ГРАВИТАЦИЯ»

Фильм Алексея Семихатова «ГРАВИТАЦИЯ»

Краткое объяснение больших языковых моделей

Краткое объяснение больших языковых моделей

Ex-OpenAI Scientist WARNS:

Ex-OpenAI Scientist WARNS: "You Have No Idea What's Coming"

Как так быстро развились диффузионные LLM-технологии?

Как так быстро развились диффузионные LLM-технологии?

Interactive symbolic regression with co-design mechanism

Interactive symbolic regression with co-design mechanism

All Machine Learning algorithms explained in 17 min

All Machine Learning algorithms explained in 17 min

Вся IT-база в ОДНОМ видео: Память, Процессор, Код

Вся IT-база в ОДНОМ видео: Память, Процессор, Код

Самая недооценённая идея в науке

Самая недооценённая идея в науке