Mixture of Experts (MoE) Explained — The Architecture That Broke the Bigger-Slower Tradeoff
Автор: Jeff Heidelberger
Загружено: 2026-05-07
Просмотров: 11
Описание:
What if you could have a 100-billion-parameter model that only uses 20 billion per query? That's MoE — and it's already how Mixtral and DeepSeek work right now.
MoE breaks the iron rule of dense architectures: bigger = slower. Together with quantization (covered in my previous video), MoE is one of the two technologies that made local AI possible on consumer hardware.
📑 CHAPTERS:
0:00 — The Problem MoE Solves
1:00 — Core Architecture: Router + Experts
2:15 — Why It Matters: The Free Lunch (Mixtral, DeepSeek V3 numbers)
3:30 — Key Models: Mixtral, DeepSeek V3, Qwen2.5-MoE, DBRX
4:30 — How MoE + Quantization Enabled Local AI
5:30 — Training vs Inference: The Tradeoffs
6:30 — Limitations and Future Directions (Expert Offloading, Mixture of Depths)
7:30 — Bottom Line
Go try Mixtral 8x7B right now. One command: ollama run mixtral:8x7b
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: