Towards Robotics Foundation Model that can Reason - Jiafei Duan 11.07 2025

Автор: UT-Austin Robot Perception and Learning Lab

Загружено: 2025-11-07

Просмотров: 150

Описание: Abstract:

In recent years, we have witnessed remarkable progress in generative AI, particularly in language and visual understanding and generation. This leap has been fueled by unprecedentedly large image–text datasets and the scaling of large language and vision models trained on them. Increasingly, these advances are being leveraged to equip and empower robots with open-world visual understanding and reasoning capabilities.

Yet, despite these advances, scaling such models for robotics remains challenging due to the scarcity of large-scale, high-quality robot interaction data, limiting their ability to generalize and truly reason about actions in the real world. Nonetheless, promising results are emerging from using multimodal large language models (MLLMs) as the backbone of robotic systems, especially in enabling the acquisition of low-level skills required for robust deployment in everyday household settings.

In this talk, I will present three recent works that aim to bridge the gap between rich semantic world knowledge in MLLMs and actionable robot control. I will begin with AHA, a vision-language model that reasons about failures in robotic manipulation and improves the robustness of existing systems. Building on this, I will introduce SAM2Act, a 3D generalist robotic model with a memory-centric architecture capable of performing high-precision manipulation tasks while retaining and reasoning over past observations. Finally, I will present MolmoAct, AI2’s flagship robotic foundation model for spatial reasoning, designed as a generalist system that can be post-trained for a wide range of downstream manipulation tasks.

Bio:

Jiafei Duan is a Ph.D. candidate in Computer Science & Engineering at the University of Washington, advised by Professors Dieter Fox and Ranjay Krishna. His research focuses on foundation models for robotics, with an emphasis on developing scalable data collection and generation methods, grounding vision-language models in robotic reasoning, and advancing robust generalization in robot learning. His work has been featured in MIT Technology Review, GreekWire, VentureBeat, and Business Wire.

Jiafei’s research has been published in top AI and robotics venues, including ICLR, ICML, RSS, CoRL, ECCV, IJCAI, CoLM, and EMNLP, and has earned awards such as Best Paper at Ubiquitous Robots 2023 and a Spotlight at ICLR 2024. He is a recipient of both the ASTAR National Science PhD Scholarship and the ASTAR Undergraduate Scholarship.

Personal homepage: https://duanjiafei.com/;

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Towards Robotics Foundation Model that can Reason - Jiafei Duan 11.07 2025

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Data-Centric Understanding of Policy Behavior and Performance with Influence Functions - 10.31.2025

Data-Centric Understanding of Policy Behavior and Performance with Influence Functions - 10.31.2025

Spatial Intelligence as Structured Representation for Robotics - Wenlong Huang 2.20.2026

Spatial Intelligence as Structured Representation for Robotics - Wenlong Huang 2.20.2026

Беседа у камина: Всемирно известный эксперт г-н Дэн Мюллер, генеральный директор и основатель DMC...

Беседа у камина: Всемирно известный эксперт г-н Дэн Мюллер, генеральный директор и основатель DMC...

Trustworthy World Models for Safe & Generalist Robots - Anirudha Majumdar 0213

Trustworthy World Models for Safe & Generalist Robots - Anirudha Majumdar 0213

Beyond Motion Tracking for Motor Skill Learning - Jason Peng 0926

Beyond Motion Tracking for Motor Skill Learning - Jason Peng 0926

Test-Time Training Done Right for Long Context Multi-modal models.

Test-Time Training Done Right for Long Context Multi-modal models.

Wojna, ropa i inflacja. Tego scenariusza boją się rynki II Piotr Kuczyński # 52

Wojna, ropa i inflacja. Tego scenariusza boją się rynki II Piotr Kuczyński # 52

Лучший документальный фильм про создание ИИ

Лучший документальный фильм про создание ИИ

VLASH: Real-Time VLAs via Future-State-Aware Asynchronous inference - Jiaming Tang 03/09/2026

VLASH: Real-Time VLAs via Future-State-Aware Asynchronous inference - Jiaming Tang 03/09/2026

Robust Manipulation in the Real World: Principles and Applications - Dr. Yifan Hou 11.21

Robust Manipulation in the Real World: Principles and Applications - Dr. Yifan Hou 11.21

„Cinkciarstwo” prezydenta. Kulisy spotkania w pałacu. Ryzyko Kaczyńskiego | BEZ TRYBU

„Cinkciarstwo” prezydenta. Kulisy spotkania w pałacu. Ryzyko Kaczyńskiego | BEZ TRYBU

🔴 NOCNA ZMIANA | KAMILA BIEDRZYCKA & DR MIROSŁAW OCZKOŚ

🔴 NOCNA ZMIANA | KAMILA BIEDRZYCKA & DR MIROSŁAW OCZKOŚ

🔴 Kim byli naprawdę? Historia zmienianych nazwisk w PRL

🔴 Kim byli naprawdę? Historia zmienianych nazwisk w PRL

Data Scaling on the Job - Chuan Wen (10.17)

Data Scaling on the Job - Chuan Wen (10.17)

Toward Agile and Generalizable Humanoid Interaction Skills from Human Videos - 0227 Yinhuai Wang

Toward Agile and Generalizable Humanoid Interaction Skills from Human Videos - 0227 Yinhuai Wang

AI Disruption: Shaping the Future | HLF Webinar

AI Disruption: Shaping the Future | HLF Webinar

Kradną w MON tyle, co za PiS. Krzysztof Kluzek o tym, że korupcja i łapówki są nadal bez w cenie.

Kradną w MON tyle, co za PiS. Krzysztof Kluzek o tym, że korupcja i łapówki są nadal bez w cenie.

Pompa ciepła i jak zwiększyć COP? Wymiennik regeneracyjny w obliczeniach. Szkolenie, część 11.

Pompa ciepła i jak zwiększyć COP? Wymiennik regeneracyjny w obliczeniach. Szkolenie, część 11.

3D Thursdays - 18th Webinar: Cold Spray Additive Manufacturing - Opportunities & Challenges

3D Thursdays - 18th Webinar: Cold Spray Additive Manufacturing - Opportunities & Challenges

Sim-and-Real Co-Training: A Simple Recipe for Vision-Based Robotic Manipulation

Sim-and-Real Co-Training: A Simple Recipe for Vision-Based Robotic Manipulation