Self-play for Self-driving and where Scaling Reinforcement Learning is Heading with Eugene Vinitsky

Автор: Interconnects AI

Загружено: 2025-03-12

Просмотров: 1806

Описание: Eugene Vinitsky is a professor a New York University department of Civil and Urban Engineering. He’s one of my original reinforcement learning friends from when we were both doing our Ph.D.’s in RL at UC Berkeley circa 2020. Eugene has extensive experience in self-driving, open endedness, multi-agent reinforcement learning, and self-play with RL.

Full details, paper links, etc: https://www.interconnects.ai/p/interv...

In this conversation we focus on a few key topics:

His latest results on self-play for self-driving and what they say about the future of RL,

Why self-play is confusing and how it relates to the recent takeoff of RL for language models, and

The future of RL in LMs and elsewhere.

This is a conversation where we take the time to distill very cutting edge research directions down into the core essences. I felt like we were learning in real time what recent developments mean for RL, how RL has different scaling laws for deep learning, and what is truly salient about self-play.

The main breakthrough we discuss is scaling up self-play techniques for large-scale, simulated reinforcement learning. Previously, scaling RL in simulation has become economical in single-agent domains. Now, the door is open to complex, multi-agent scenarios where more diversity is needed to find solutions (in this case, that’s what self play does).

00:00:00 Introduction & RL Fundamentals
00:11:27 Self‑Play for Self‑Driving Cars
00:31:57 RL Scaling in Robotics and Other Domains
00:44:23 Language Models and In-Context Preference Learning
00:55:31 Future of RL and Grad School Advice

Get Interconnects (https://www.interconnects.ai/)...
... on YouTube: / @interconnects
... on Twitter: https://x.com/interconnectsai
... on Linkedin: / interconnects-ai
... on Spotify: https://open.spotify.com/show/2UE6s7w...
… on Apple Podcasts: https://podcasts.apple.com/us/podcast...

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Self-play for Self-driving and where Scaling Reinforcement Learning is Heading with Eugene Vinitsky

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Why NVIDIA builds their own open models | Nemotron w/ Bryan Catanzaro

Why NVIDIA builds their own open models | Nemotron w/ Bryan Catanzaro

Eugene Vinitsky - Robust Autonomy Emerges from Self Play

Eugene Vinitsky - Robust Autonomy Emerges from Self Play

Why Non-Human Identity Is Becoming a Security Priority - The CyberVault with Aembit

Why Non-Human Identity Is Becoming a Security Priority - The CyberVault with Aembit

Новое инженерное решение - неограниченный контекст и предсказуемые рассуждения - Recursive LM.

Новое инженерное решение - неограниченный контекст и предсказуемые рассуждения - Recursive LM.

OLMo leads on the secrets of training language models (w Dirk Groeneveld, Kyle Lo, & Luca Soldaini)

OLMo leads on the secrets of training language models (w Dirk Groeneveld, Kyle Lo, & Luca Soldaini)

Causal Mechanistic Interpretability (Stanford lecture 1) - Atticus Geiger

Causal Mechanistic Interpretability (Stanford lecture 1) - Atticus Geiger

ИИ - ЭТО ИЛЛЮЗИЯ ИНТЕЛЛЕКТА. Но что он такое и почему совершил революцию?

ИИ - ЭТО ИЛЛЮЗИЯ ИНТЕЛЛЕКТА. Но что он такое и почему совершил революцию?

AlphaZero and Self Play (David Silver, DeepMind) | AI Podcast Clips

AlphaZero and Self Play (David Silver, DeepMind) | AI Podcast Clips

Обучение с подкреплением для агентов — Уилл Браун, исследователь машинного обучения в Morgan Stanley

Обучение с подкреплением для агентов — Уилл Браун, исследователь машинного обучения в Morgan Stanley

Краткое объяснение больших языковых моделей

Краткое объяснение больших языковых моделей

Илон Маск (свежее интервью 2026): энергетика, ИИ, технологии, освоение космоса, андроиды, другое

Илон Маск (свежее интервью 2026): энергетика, ИИ, технологии, освоение космоса, андроиды, другое

Ross Taylor, Ex-Llama reasoning lead, on Chinese open models, scaling RL, & the next 6 months in AI

Ross Taylor, Ex-Llama reasoning lead, on Chinese open models, scaling RL, & the next 6 months in AI

О закате развитой рыночной экономики 02 01 2026 The decline of the developed market economy

О закате развитой рыночной экономики 02 01 2026 The decline of the developed market economy

ChatGPT продает ваши чаты, Anthropic создает цифровых существ, а Маск как всегда…

ChatGPT продает ваши чаты, Anthropic создает цифровых существ, а Маск как всегда…

Важные открытия XXI века: почему рак победил и что не так с клонированием? Что скрывают нобелевки?

Важные открытия XXI века: почему рак победил и что не так с клонированием? Что скрывают нобелевки?

Мы стоим на пороге нового конфликта! Что нас ждет дальше? Андрей Безруков про США, Россию и кризис

Мы стоим на пороге нового конфликта! Что нас ждет дальше? Андрей Безруков про США, Россию и кризис

Гипотеза Какея (не смеяться, это серьёзная математика) | LAPLAS

Гипотеза Какея (не смеяться, это серьёзная математика) | LAPLAS

Савватеев разоблачает фокусы Земскова

Савватеев разоблачает фокусы Земскова

Dueling AI - Self-Play Reinforcement Learning with a Custom Pong Engine - Part 1

Dueling AI - Self-Play Reinforcement Learning with a Custom Pong Engine - Part 1

The Hairy Ball Theorem

The Hairy Ball Theorem