ISO-Bench: Can Coding Agents Optimize Real-World Inference Workloads?
Автор: Emergent Mind
Загружено: 2026-02-26
Просмотров: 13
Описание:
Paper: ISO-Bench: Can Coding Agents Optimize Real-World Inference Workloads? (2602.19594)
Published: 23 Feb 2026.
Learn more on Emergent Mind: https://www.emergentmind.com/papers/2...
arXiv: https://arxiv.org/abs/2602.19594
Sign up for our free trending papers email digest: https://www.emergentmind.com/subscribe
Follow us on X: https://x.com/EmergentMind
Join our Discord: / discord
This presentation examines ISO-Bench, a groundbreaking benchmark that evaluates whether LLM-based coding agents can perform real-world GPU inference optimizations. Drawing from 54 production tasks in vLLM and SGLang, the benchmark introduces a dual-metric framework combining hard performance metrics with soft qualitative assessments. The evaluation reveals a critical understanding-execution gap: agents often identify correct bottlenecks but fail to implement working solutions, with up to 20% of apparent successes resulting from accidental improvements rather than genuine optimization.
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: