TUMIX: an AI framework that integrates Code Interpreter and Search into LLMs via test-time scaling

Автор: Marktechpost AI

Загружено: 2025-10-04

Просмотров: 517

Описание: Google’s TUMIX is a test-time framework that runs heterogeneous agent styles (text-only Chain-of-Thought, code execution, web search, guided variants) in parallel, lets them share intermediate answers for a few refinement rounds, and uses an LLM-judge to stop early when consensus is high. On tough reasoning benchmarks, it consistently outperforms strong tool-augmented baselines at similar budgets; with Gemini-2.5 Pro, TUMIX+ reports 34.1% on Humanity’s Last Exam, a finalized 2,500-question benchmark, and shows gains on GPQA-Diamond (198 questions) and AIME while cutting compute via early termination and disciplined tool budgets. The empirical sweet spot is ~12–15 agent styles; beyond that, accuracy saturates and selection—not generation—becomes the bottleneck.....

full analysis: https://www.marktechpost.com/2025/10/...

paper: https://arxiv.org/abs/2510.01279

‪@Google‬ ‪@GoogleResearch‬

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

TUMIX: an AI framework that integrates Code Interpreter and Search into LLMs via test-time scaling

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео