WALL-OSS: A New Era of Embodied Foundation Models for Robot Manipulation
Автор: YOE YAT CHONG
Загружено: 2025-12-30
Просмотров: 75
Описание:
x-square-robot/wall-oss-fast
x-square-robot/wall-oss-flow
While foundation models show remarkable progress in language and vision, existing vision-language models (VLMs) still have limited spatial and embodiment understanding. Transferring VLMs to embodied domains reveals fundamental mismatches between modalities, pretraining distributions, and training objectives, leaving action comprehension and generation as a central bottleneck on the path to AGI.
WALL-OSS is introduced as an end-to-end embodied foundation model that leverages large-scale multimodal pretraining to achieve (1) embodiment-aware vision-language understanding, (2) strong language-action association, and (3) robust manipulation capability. The approach employs a tightly coupled architecture and multi-strategies training curriculum that enables Unified Cross-Level CoT—seamlessly unifying instruction reasoning, subgoal decomposition, and fine-grained action synthesis within a single differentiable framework.
The results demonstrate that WALL-OSS attains high success on complex long-horizon manipulations, demonstrates strong instruction-following capabilities, and outperforms strong baselines, thereby providing a reliable and scalable path from VLMs to VLA embodied foundation models.
-//x2robot.com/en/research/68bc2cde8497d7f238dde690
-//huggingface.co/x-square-robot/wall-oss-flow
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: