μTransfer: Tuning GPT-3 hyperparameters on one GPU | Explained by the inventor
Автор: Edward Hu
Загружено: 2023-10-17
Просмотров: 4467
Описание:
How can one tune the hyperparameters of an enormous neural network like GPT-3 on a single GPU?
*Like, subscribe, and share if you find this video valuable!*
Paper: https://arxiv.org/abs/2203.03466
Repo: https://github.com/microsoft/mup
Jupyter notebook to reproduce μTransfer:
https://github.com/microsoft/mup/blob...
0:00 - Intro
0:45 - μTransfer in 3 steps
3:00 - Why μP and μTransfer work
5:42 - How to apply μTransfer today
For more on the central limit theorem (CLT) and the law of large numbers (LLN):
https://en.wikipedia.org/wiki/Central...
https://en.wikipedia.org/wiki/Law_of_...
Both CLT and LLN behaviors appear during NN training, but which one dominates is determined by the correlation between weights and activations.
A more technical talk on μP by Greg Yang
• Feature Learning in Infinite-Width Neural ...
Follow me on Twitter:
/ edwardjhu
🙏Gratitude:
μTransfer won't happen without my amazing collaborators: Greg Yang, Igor Babuschkin, Szymon Sidor, Xiaodong Liu, David Farhi, Nick Ryder, Jakub Pachocki, Weizhu Chen, and Jianfeng Gao.
Also, thank you Isa Fulford, Mo Tiwari, and Andrej Karpathy for your constructive feedback on this video!
Повторяем попытку...

Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: