TUMIX: an AI framework that integrates Code Interpreter and Search into LLMs via test-time scaling
Автор: Marktechpost AI
Загружено: 2025-10-04
Просмотров: 517
Описание:
Google’s TUMIX is a test-time framework that runs heterogeneous agent styles (text-only Chain-of-Thought, code execution, web search, guided variants) in parallel, lets them share intermediate answers for a few refinement rounds, and uses an LLM-judge to stop early when consensus is high. On tough reasoning benchmarks, it consistently outperforms strong tool-augmented baselines at similar budgets; with Gemini-2.5 Pro, TUMIX+ reports 34.1% on Humanity’s Last Exam, a finalized 2,500-question benchmark, and shows gains on GPQA-Diamond (198 questions) and AIME while cutting compute via early termination and disciplined tool budgets. The empirical sweet spot is ~12–15 agent styles; beyond that, accuracy saturates and selection—not generation—becomes the bottleneck.....
full analysis: https://www.marktechpost.com/2025/10/...
paper: https://arxiv.org/abs/2510.01279
@Google @GoogleResearch
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: