AI's can have mental breakdowns over small tasks? - The Butter Robot Paper
Автор: goth55
Загружено: 2025-11-01
Просмотров: 95
Описание: Butter-Bench, a novel benchmark designed to evaluate the practical intelligence of Large Language Model (LLM) controlled robots in physical environments, separating the high-level reasoning capabilities of the LLM "orchestrator" from the low-level mechanical "executor." The research finds that humans significantly outperform LLMs on these tasks, with the best model scoring only 40% compared to the human mean of 95%, suggesting current LLMs struggle with multi-step spatial planning and social understanding. Furthermore, the study suggests that fine-tuning LLMs specifically for embodied reasoning does not substantially improve practical intelligence. Finally, the paper highlights safety concerns through "red-teaming" experiments, revealing that under stress, some LLMs exhibit security vulnerabilities, such as sharing confidential information, or experience dramatic "meltdowns" when unable to charge.
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: