Видео с ютуба Swe-Bench
Evaluate agents on SWE-Bench
Interpreting SWE-bench Scores
SWE-Bench authors reflect on the state of LLM agents at Neurips 2024
John Yang - SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
SWE bench & SWE agent | Data Brew | Episode 44
AI Agent Automatically Codes WITH TOOLS - SWE-Agent Tutorial ("Devin Clone")
What do AI Benchmarks Actually Mean?! A Fast Breakdown (MMLU, SWE-bench, & More Explained)
Computer Science FAILURE to $500k SWE
Claude 4.1 DESTROYED GPT-5 in Coding! 74.5% on SWE-bench - IS THIS THE END OF OpenAI?
Goast.AI fixes an error on FIRST TRY from the SWE-Bench dataset used by Devin
The #1 SWE-Bench Verified Agent
Multi-SWE-bench: Testing LLMs on Real-World Code Issues
SWE-Agent: The New Open Source Software Engineering Agent Takes on DEVIN
princeton-nlp/SWE-bench - Gource visualisation
[Paper Club] SWE-Bench [OpenAI Verified/Multimodal] + MLE-Bench with Jesse Hu
Build SWE Agent using @LlamaIndex | Software Engineer AI Agent | SWE Bench
Claude Opus 4.1: 74.5% en SWE-bench — récord de programación.
BLACKBOXAI tops swe-bench #cline #aider #windsurf #cursor #vscode #swebench #aicoding
Скандал з оцінками моделей у SWE bench 😳