Prof. Manling Li: RAGEN: Training Agents by Reinforcing Reasoning

Автор: AI Agent Frontier

Загружено: 2025-09-20

Просмотров: 461

Описание: Talk Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in reasoning tasks, but the emerging Large Agent Models (LAMs) faces unique challenges as these models learn to interact with dynamic environments. This talk explores the fundamental framework for understanding and improving agent decision-making across long-horizon multi-round interactions. We begin by formalizing agent reasoning as a Markov Decision Process (MDP) and introduce the Embodied Agent Interface, a standardized framework for studying core agent capabilities including goal interpretation, subgoal decomposition, action sequencing, and transition modeling. Through this lens, we identify long-horizon decision making as a critical bottleneck that requires specialized training approaches. To address this challenge, we present RAGEN, a novel framework that is inspired by the recent success of DeepSeek-R1(Zero) using rule-based reward in reinforcement learning. RAGEN tackles two key challenges in real-world agent scenarios: environmental non-deterministic reward and long-horizon multi-turn interactions. To handle visual states, we introduce VAGEN to formulate the problem as a Partially Observable Markov Decision Process, enabling more robust learning in complex visual states.

Bio: Manling Li is an Assistant Professor at Northwestern University. She was a postdoc at Stanford University and obtained the PhD degree in Computer Science at University of Illinois Urbana-Champaign in 2023. She works on the intersection of language, vision, and robotics. Her work won the ACL’24 Outstanding Paper Award, ACL'20 Best Demo Paper Award, and NAACL'21 Best Demo Paper Award. She was a recipient of Microsoft Research PhD Fellowship in 2021, an EE CS Rising Star in 2022,etc. She served as Organizing Committee of ACL 25, NAACL 25, EMNLP 24, and invited as keynote speaker at Amazon-Illinois Center on AI for Interactive Conversational Experiences. Additional information is available at https://limanling.github.io/.

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Prof. Manling Li: RAGEN: Training Agents by Reinforcing Reasoning

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Prof. Natasha Jaques: Multi-agent Reinforcement Learning (MARL) for LLMs

Prof. Natasha Jaques: Multi-agent Reinforcement Learning (MARL) for LLMs

Prof. Furong Huang: Towards AI Security – An Interplay of Stress-Testing and Alignment

Prof. Furong Huang: Towards AI Security – An Interplay of Stress-Testing and Alignment

Prof. Peter Stone: Human-in-the-Loop Machine Learning for Robot Navigation and Manipulation

Prof. Peter Stone: Human-in-the-Loop Machine Learning for Robot Navigation and Manipulation

Prof. Eric Xin Wang: Building AI Agents that Reason and Act Like Humans

Prof. Eric Xin Wang: Building AI Agents that Reason and Act Like Humans

LLM fine-tuning или ОБУЧЕНИЕ малой модели? Мы проверили!

LLM fine-tuning или ОБУЧЕНИЕ малой модели? Мы проверили!

Обзор теории DeepSeek R1 | GRPO + RL + SFT

Обзор теории DeepSeek R1 | GRPO + RL + SFT

Они убили китайскую электронику! Как США и Нидерланды сломали Китай за один ход

Они убили китайскую электронику! Как США и Нидерланды сломали Китай за один ход

Обучение с подкреплением и обратной связью с человеком (RLHF) — как обучать и настраивать модели ...

Обучение с подкреплением и обратной связью с человеком (RLHF) — как обучать и настраивать модели ...

Может ли у ИИ появиться сознание? — Семихатов, Анохин

Может ли у ИИ появиться сознание? — Семихатов, Анохин

[Full Workshop] Reinforcement Learning, Kernels, Reasoning, Quantization & Agents — Daniel Han

[Full Workshop] Reinforcement Learning, Kernels, Reasoning, Quantization & Agents — Daniel Han

Кто пишет код лучше всех? Сравнил GPT‑5.2, Opus 4.5, Sonnet 4.5, Gemini 3, Qwen 3 Max, Kimi, GLM

Кто пишет код лучше всех? Сравнил GPT‑5.2, Opus 4.5, Sonnet 4.5, Gemini 3, Qwen 3 Max, Kimi, GLM

Chelsea Finn: Building Robots That Can Do Anything

Chelsea Finn: Building Robots That Can Do Anything

Manling Li - RAGEN: Training Agents by Reinforcing Reasoning

Manling Li - RAGEN: Training Agents by Reinforcing Reasoning

Deploying and Scaling Large Language Models in the Enterprise

Deploying and Scaling Large Language Models in the Enterprise

Почему огонь ГОРИТ. Ответ Фейнмана переворачивает реальность

Почему огонь ГОРИТ. Ответ Фейнмана переворачивает реальность

Stanford CS25: V5 I Large Language Model Reasoning, Denny Zhou of Google Deepmind

Stanford CS25: V5 I Large Language Model Reasoning, Denny Zhou of Google Deepmind

Experimenting with Reinforcement Learning with Verifiable Rewards (RLVR)

Experimenting with Reinforcement Learning with Verifiable Rewards (RLVR)

ИИ - ЭТО ИЛЛЮЗИЯ ИНТЕЛЛЕКТА. Но что он такое и почему совершил революцию?

ИИ - ЭТО ИЛЛЮЗИЯ ИНТЕЛЛЕКТА. Но что он такое и почему совершил революцию?

⚡️ Путин предложил Западу сделку || НАТО поставили перед условием

⚡️ Путин предложил Западу сделку || НАТО поставили перед условием

КОНСТАНТИН АНОХИН: Человек, который пытается ВЗЛОМАТЬ Код Сознания

КОНСТАНТИН АНОХИН: Человек, который пытается ВЗЛОМАТЬ Код Сознания