Visual Spatial Tuning: Boosting VLM Spatial Skills
Автор: AI Research Roundup
Загружено: 2025-11-09
Просмотров: 45
Описание:
In this AI Research Roundup episode, Alex discusses the paper:
'Visual Spatial Tuning'
This work targets Vision–Language Models’ weak visuospatial understanding without adding heavy 3D encoders. The authors introduce Visual Spatial Tuning (VST), a data-plus-training paradigm combining supervised fine-tuning and reinforcement learning to inject spatial knowledge into standard VLMs. VST includes VST‑Perception (4.1M samples across 19 tasks, from relative depth and 9‑DoF 3D detection to grounding and spatiotemporal reasoning) and VST‑Reasoning (135K CoT and rule-checkable samples for online RL). Key engineering choices include FoV unification, mixed instruction formats, and BEV‑aided prompting, trained in a progressive multi-stage pipeline.
Paper URL: https://arxiv.org/abs/2511.05491
#AI #MachineLearning #DeepLearning #VisionLanguageModels #SpatialReasoning #3DPerception #ReinforcementLearning
Resources:
GitHub: https://github.com/Yangr116/VST
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: