Gemini 3.1 Pro and the Downfall of Benchmarks: Welcome to the Vibe Era of AI
Автор: AI Explained
Загружено: 2026-02-20
Просмотров: 37785
Описание:
Do we have a new best AI model, or do we have the downfall of benchmarks in general, as a way of capturing machine intelligence? Full breakdown of Gemini 3.1 Pro, guest-starring the new Sonnet 4.6, plus analysis from 7 papers/posts that will give you much needed context. Oh, and a new record on Simple Bench!
https://epoch.ai/ai-explained-datacen...
Check out my fast-growing (!) app, free to use, and code INSIDER15 for Pro: https://lmcouncil.ai
AI Insiders ($9!): / aiexplained
Chapters:
00:00 - Introduction
00:30 - Post-training Dominance
04:00 - ARC-AGI 2 Caveat
05:54 - Simple Bench Record
08:22 - Hallucination Caveat
10:05 - Model Card
11:12 - Exponential Coming
12:20 - Amodei on Generalizing
15:10 - One True Benchmark?
17:02 - Other Metrics…
Gemini 3.1 Model Card: https://storage.googleapis.com/deepmi...
Release: https://blog.google/innovation-and-ai...
Where are Agents deployed?: https://www.anthropic.com/research/me...
Newsletter Post: https://signaltonoise.beehiiv.com/p/4...
Hallucination AA: https://artificialanalysis.ai/evaluat...
Melanie Mitchell: https://x.com/MelMitchell1/status/202...
ARC-AGI-2: https://x.com/arcprize/status/2024522...
Chollet on Agentic Coding and ML: https://x.com/fchollet/status/2024519...
METR Caveat: https://metr.org/notes/2026-01-22-tim...
Talaas Fast: https://chatjimmy.ai/
Amodei Interview Continual learning: https://www.dwarkesh.com/p/dario-amod...
Metaculus FutureEval: https://www.metaculus.com/futureeval/
Next Vid to Watch: / what-you-need-to-150647292
Non-hype Newsletter: https://signaltonoise.beehiiv.com/
Podcast: https://aiexplainedopodcast.buzzsprou...
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: