Writing Mixture of Experts LLMs from Scratch in PyTorch
Автор: Neural Breakdown with AVB
Загружено: 2025-03-11
Просмотров: 4696
Описание:
In this video, we discuss Mixture of Experts Transformers - the backbone behind popular LLMs like DeepSeek V3, Mixtral 8x22B, and more. You will learn concepts like Dense MOEs, Sparse MOEs, Top-K Routing, Noisy Routing, Expert Capacity, Switch Transformers, Auxilliary load balancing losses, and many more. Everything is presented visually to help conceptualize what is going on, and code snippets are provided to make it more concrete!
Follow on Twitter: https://x.com/neural_avb
To support this channel, you can buy me a coffee at: https://ko-fi.com/neuralavb
Join the channel on Patreon to receive updates about the channel, and get access to bonus content used in all my videos. You will get the slides, notebooks, code snippets, word docs, and animations that went into producing this video. Here is the link:
/ neuralbreakdownwithavb
Visit AI Agent Store Page: https://aiagentstore.ai/?ref=avishek
#pytorch #transformers #deepseek
Videos and playlists you would like:
Attention to Transformers playlist: • Attention to Transformers from zero to her...
Guide to fine-tuning open source LLMs: • Finetune LLMs to teach them ANYTHING with ...
Generative Language Modeling from scratch: • From Attention to Generative Language Mode...
References and additional links:
Sparse Mixture of Experts paper: https://arxiv.org/abs/1701.06538
Mixtral of Experts: https://arxiv.org/abs/2401.04088
DeepSeek V2: https://arxiv.org/abs/2405.04434
DeepSeek V3: https://arxiv.org/abs/2412.19437
Switch Transformers / Expert Capacity: https://arxiv.org/abs/2101.03961
A Blog post: https://brunomaga.github.io/Mixture-o...
A visual guide: https://newsletter.maartengrootendors...
Survey paper: https://arxiv.org/pdf/2407.06204
Timestamps:
0:00 - Intro
1:52 - Mixture of Experts Intuition
4:53 - Transformers 101
9:20 - Dense MOEs
14:50 - Sparse MOEs
16:34 - Router Collapse and Top-K Routing
19:20 - Noisy TopK, Load Balancing
20:56 - Routing Analysis by Mixtral
22:30 - Auxilliary Losses & DeepSeek
24:05 - Expert Capacity
26:07 - 6 Points to Remember
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: