ISO-Bench: Can Coding Agents Optimize Real-World Inference Workloads?

Автор: Emergent Mind

Загружено: 2026-02-26

Просмотров: 13

Описание: Paper: ISO-Bench: Can Coding Agents Optimize Real-World Inference Workloads? (2602.19594)
Published: 23 Feb 2026.

Learn more on Emergent Mind: https://www.emergentmind.com/papers/2...
arXiv: https://arxiv.org/abs/2602.19594
Sign up for our free trending papers email digest: https://www.emergentmind.com/subscribe
Follow us on X: https://x.com/EmergentMind
Join our Discord: / discord

This presentation examines ISO-Bench, a groundbreaking benchmark that evaluates whether LLM-based coding agents can perform real-world GPU inference optimizations. Drawing from 54 production tasks in vLLM and SGLang, the benchmark introduces a dual-metric framework combining hard performance metrics with soft qualitative assessments. The evaluation reveals a critical understanding-execution gap: agents often identify correct bottlenecks but fail to implement working solutions, with up to 20% of apparent successes resulting from accidental improvements rather than genuine optimization.

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

ISO-Bench: Can Coding Agents Optimize Real-World Inference Workloads?

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео