Early stages of the reinforcement learning era of language models
Автор: Nathan Lambert
Загружено: 2025-03-10
Просмотров: 5284
Описание:
Hey friends! This is a recent talk I gave at the UC Santa Cruz Silicon Valley Extension to their Natural Language Processing (NLP) masters students, doctoral students, alumni, and friends.
In this talk I cover the recent trend of reinforcement finetuning of language models, how it came about, technically how it is done, early experiments using it at Ai2 and recent mainstream releases utilizing it (DeepSeek R1, Claude 3.7, Grok 3, etc.). I conclude with a future of extensive RL training rather than just finetuning.
You can find the slides here: https://docs.google.com/presentation/...
Or, the full recording with talks from Alessio of Latent Space and Dylan of SemiAnalysis here: • Frontiers of AI: Language, Inference, and ...
Very related to a recent talk I gave on my primary Interconnects channel: • An Unexpected Reinforcement Learning Renai...
Thanks Sam & Jeff for hosting me! The next talk I post will include some more hot off the press RL research than this one :D
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: