The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models

Автор: Xiaol.x

Загружено: 2025-06-26

Просмотров: 203

Описание: The Illusion of Thinking:
Understanding the Strengths and Limitations of Reasoning Models
via the Lens of Problem Complexity
Parshin Shojaee∗†
Maxwell Horton
Iman Mirzadeh∗
Samy Bengio
Apple

Recent generations of frontier language models have introduced Large Reasoning Models
(LRMs) that generate detailed thinking processes before providing answers. While these models
demonstrate improved performance on reasoning benchmarks, their fundamental capabilities, scaling properties, and limitations remain insufficiently understood. Current evaluations primarily focus on established mathematical and coding benchmarks, emphasizing final answer accuracy. However, this evaluation paradigm often suffers from data contamination and does not provide insights into the reasoning traces’ structure and quality. In this work, we systematically investigate these gaps with the help of controllable puzzle environments that allow precise manipulation of compositional complexity while maintaining consistent logical structures. This setup enables the analysis of not only final answers but also the internal reasoning traces, offering insights into how LRMs “think”. Through extensive experimentation across diverse puzzles, we show that frontier LRMs face a complete accuracy collapse beyond certain complexities. Moreover, they exhibit a counter intuitive scaling limit: their reasoning effort increases with problem complexity up to a point, then declines despite having an adequate token budget. By comparing LRMs with their standard LLM counterparts under equivalent inference compute, we identify three performance regimes: (1) low complexity tasks where standard models surprisingly outperform LRMs, (2) medium-complexity tasks where additional thinking in LRMs demonstrates advantage, and (3) high-complexity tasks where both models experience complete collapse. We found that LRMs have limitations in exact computation: they fail to use explicit algorithms and reason inconsistently across puzzles. We also investigate the reasoning traces in more depth, studying the patterns of explored solutions and analyzing the models’ computational behavior, shedding light on their strengths, limitations,
and ultimately raising crucial questions about their true reasoning capabilities.

https://ml-site.cdn-apple.com/papers/...

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Harvard Professor Explains Algorithms in 5 Levels of Difficulty | WIRED

Harvard Professor Explains Algorithms in 5 Levels of Difficulty | WIRED

The Illusion of Thinking // The new Apple AI paper is...something

The Illusion of Thinking // The new Apple AI paper is...something

AI, Machine Learning, Deep Learning and Generative AI Explained

AI, Machine Learning, Deep Learning and Generative AI Explained

Can AI Think? Debunking AI Limitations

Can AI Think? Debunking AI Limitations

Почему простые числа образуют эти спирали? | Теорема Дирихле и пи-аппроксимации

Почему простые числа образуют эти спирали? | Теорема Дирихле и пи-аппроксимации

Но что такое нейронная сеть? | Глава 1. Глубокое обучение

Но что такое нейронная сеть? | Глава 1. Глубокое обучение

How AI Is Saving Billions of Years of Human Research Time | Max Jaderberg | TED

How AI Is Saving Billions of Years of Human Research Time | Max Jaderberg | TED

«Будем жить!» | Хитрая передача на Первом канале о вернувшихся с СВО (English subtitles) @Max_Katz

«Будем жить!» | Хитрая передача на Первом канале о вернувшихся с СВО (English subtitles) @Max_Katz

"Empire of AI": Karen Hao on How AI Is Threatening Democracy & Creating a New Colonial World

ПЕРВЫЙ капсульный поезд: 5000 руб.!

ПЕРВЫЙ капсульный поезд: 5000 руб.!