Mixture of Experts (MoE) Explained — The Architecture That Broke the Bigger-Slower Tradeoff

Автор: Jeff Heidelberger

Загружено: 2026-05-07

Просмотров: 11

Описание: What if you could have a 100-billion-parameter model that only uses 20 billion per query? That's MoE — and it's already how Mixtral and DeepSeek work right now.
MoE breaks the iron rule of dense architectures: bigger = slower. Together with quantization (covered in my previous video), MoE is one of the two technologies that made local AI possible on consumer hardware.
📑 CHAPTERS:
0:00 — The Problem MoE Solves
1:00 — Core Architecture: Router + Experts
2:15 — Why It Matters: The Free Lunch (Mixtral, DeepSeek V3 numbers)
3:30 — Key Models: Mixtral, DeepSeek V3, Qwen2.5-MoE, DBRX
4:30 — How MoE + Quantization Enabled Local AI
5:30 — Training vs Inference: The Tradeoffs
6:30 — Limitations and Future Directions (Expert Offloading, Mixture of Depths)
7:30 — Bottom Line
Go try Mixtral 8x7B right now. One command: ollama run mixtral:8x7b

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Mixture of Experts (MoE) Explained — The Architecture That Broke the Bigger-Slower Tradeoff

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео