When BERT Plays the Lottery, All Tickets Are Winning (Paper Explained)

Автор: Yannic Kilcher

Загружено: 2020-05-22

Просмотров: 30666

Описание: BERT is a giant model. Turns out you can prune away many of its components and it still works. This paper analyzes BERT pruning in light of the Lottery Ticket Hypothesis and finds that even the "bad" lottery tickets can be fine-tuned to good accuracy.

OUTLINE:
0:00 - Overview
1:20 - BERT
3:20 - Lottery Ticket Hypothesis
13:00 - Paper Abstract
18:00 - Pruning BERT
23:00 - Experiments
50:00 - Conclusion

https://arxiv.org/abs/2005.00561

ML Street Talk Channel:    / @machinelearningstreettalk

Abstract:
Much of the recent success in NLP is due to the large Transformer-based models such as BERT (Devlin et al, 2019). However, these models have been shown to be reducible to a smaller number of self-attention heads and layers. We consider this phenomenon from the perspective of the lottery ticket hypothesis. For fine-tuned BERT, we show that (a) it is possible to find a subnetwork of elements that achieves performance comparable with that of the full model, and (b) similarly-sized subnetworks sampled from the rest of the model perform worse. However, the "bad" subnetworks can be fine-tuned separately to achieve only slightly worse performance than the "good" ones, indicating that most weights in the pre-trained BERT are potentially useful. We also show that the "good" subnetworks vary considerably across GLUE tasks, opening up the possibilities to learn what knowledge BERT actually uses at inference time.

Authors: Sai Prasanna, Anna Rogers, Anna Rumshisky

Links:
YouTube:    / yannickilcher
Twitter:   / ykilcher
BitChute: https://www.bitchute.com/channel/yann...
Minds: https://www.minds.com/ykilcher

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

When BERT Plays the Lottery, All Tickets Are Winning (Paper Explained)

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

How I Read a Paper: Facebook's DETR (Video Tutorial)

How I Read a Paper: Facebook's DETR (Video Tutorial)

Глубокие ансамбли: перспектива утраченного ландшафта (с пояснениями к статье)

Глубокие ансамбли: перспектива утраченного ландшафта (с пояснениями к статье)

DETR: End-to-End Object Detection with Transformers (Paper Explained)

DETR: End-to-End Object Detection with Transformers (Paper Explained)

Energy-Based Transformers are Scalable Learners and Thinkers (Paper Review)

Energy-Based Transformers are Scalable Learners and Thinkers (Paper Review)

Узнайте, как сделать фокус «Предсказать лотерею»

Узнайте, как сделать фокус «Предсказать лотерею»

Mathematician Breaks Down the Best Ways to Win the Lottery | WIRED

Mathematician Breaks Down the Best Ways to Win the Lottery | WIRED

Introduction to Mechanistic Interpretability with David Bau

Introduction to Mechanistic Interpretability with David Bau

Genialny fizyk: „Prawa fizyki dowodzą, że AI jest z natury zła”!

Genialny fizyk: „Prawa fizyki dowodzą, że AI jest z natury zła”!

Гипотеза лотерейного билета: поиск разреженных, обучаемых нейронных сетей.

Гипотеза лотерейного билета: поиск разреженных, обучаемых нейронных сетей.

The Lottery Ticket Hypothesis Explained!

The Lottery Ticket Hypothesis Explained!

Изображение стоит 16x16 слов: Трансформеры для масштабного распознавания изображений (с пояснения...

Изображение стоит 16x16 слов: Трансформеры для масштабного распознавания изображений (с пояснения...

SIREN: Неявные нейронные представления с периодическими функциями активации (с пояснениями в статье)

SIREN: Неявные нейронные представления с периодическими функциями активации (с пояснениями в статье)

Hopfield Networks is All You Need (Paper Explained)

Hopfield Networks is All You Need (Paper Explained)

Модели трансформаторов и модель BERT: обзор

Модели трансформаторов и модель BERT: обзор

The Lottery Ticket Hypothesis and pruning in PyTorch

The Lottery Ticket Hypothesis and pruning in PyTorch

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding (Paper Explained)

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding (Paper Explained)

THIS IS HOW YOU WIN THE LOTTERY! Best Strategies

THIS IS HOW YOU WIN THE LOTTERY! Best Strategies

How the AI sell-off ripped through software

How the AI sell-off ripped through software

Safety Alignment Should be Made More Than Just a Few Tokens Deep (Paper Explained)

Safety Alignment Should be Made More Than Just a Few Tokens Deep (Paper Explained)

EI Seminar - Michael Carbin - The Lottery Ticket Hypothesis

EI Seminar - Michael Carbin - The Lottery Ticket Hypothesis