μTransfer: Tuning GPT-3 hyperparameters on one GPU | Explained by the inventor

Автор: Edward Hu

Загружено: 2023-10-17

Просмотров: 4467

Описание: How can one tune the hyperparameters of an enormous neural network like GPT-3 on a single GPU?

*Like, subscribe, and share if you find this video valuable!*

Paper: https://arxiv.org/abs/2203.03466
Repo: https://github.com/microsoft/mup

Jupyter notebook to reproduce μTransfer:
https://github.com/microsoft/mup/blob...

0:00 - Intro
0:45 - μTransfer in 3 steps
3:00 - Why μP and μTransfer work
5:42 - How to apply μTransfer today

For more on the central limit theorem (CLT) and the law of large numbers (LLN):
https://en.wikipedia.org/wiki/Central...
https://en.wikipedia.org/wiki/Law_of_...

Both CLT and LLN behaviors appear during NN training, but which one dominates is determined by the correlation between weights and activations.

A more technical talk on μP by Greg Yang
• Feature Learning in Infinite-Width Neural ...

Follow me on Twitter:
/ edwardjhu

🙏Gratitude:
μTransfer won't happen without my amazing collaborators: Greg Yang, Igor Babuschkin, Szymon Sidor, Xiaodong Liu, David Farhi, Nick Ryder, Jakub Pachocki, Weizhu Chen, and Jianfeng Gao.
Also, thank you Isa Fulford, Mo Tiwari, and Andrej Karpathy for your constructive feedback on this video!

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

μTransfer: Tuning GPT-3 hyperparameters on one GPU | Explained by the inventor

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

What is Low-Rank Adaptation (LoRA) | explained by the inventor

What is Low-Rank Adaptation (LoRA) | explained by the inventor

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

Greg Yang - "Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer"

MAMBA from Scratch: Neural Nets Better and Faster than Transformers

MAMBA from Scratch: Neural Nets Better and Faster than Transformers

Но что такое нейронная сеть? | Глава 1. Глубокое обучение

Но что такое нейронная сеть? | Глава 1. Глубокое обучение

Low-rank Adaption of Large Language Models Part 2: Simple Fine-tuning with LoRA

Low-rank Adaption of Large Language Models Part 2: Simple Fine-tuning with LoRA

Похудей на 45 КГ, Выиграй $250,000!

Похудей на 45 КГ, Выиграй $250,000!

Speculations on Test-Time Scaling (o1)

Speculations on Test-Time Scaling (o1)

Купили УРАЛ Лесовоз. Первое знакомство!

Купили УРАЛ Лесовоз. Первое знакомство!

Are GFlowNets the future of AI?

Are GFlowNets the future of AI?