SWE-EVO: Benchmarking AI Coding Agents in Long-Horizon Software Evolution
Автор: PaperLens
Загружено: 2026-01-13
Просмотров: 5
Описание: Explore SWE-EVO, a pioneering benchmark from Marvis AI, FPT Software AI Center, and the University of Melbourne. While traditional benchmarks focus on isolated bug fixes, SWE-EVO tasks agents with long-horizon software evolution—requiring them to interpret release notes and coordinate changes across an average of 21 files,,. The sources reveal a "striking capability gap": even GPT-5 resolved only 21% of these complex tasks compared to 65% on SWE-Bench Verified,. This video covers the shift from discrete issue resolution to autonomous codebase evolution and introduces the Fix Rate metric for measuring partial progress.
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: