A Dive Into Multihead Attention, Self-Attention and Cross-Attention

Автор: Machine Learning Studio

Загружено: 2023-04-16

Просмотров: 55132

Описание: In this video, I will first give a recap of Scaled Dot-Product Attention, and then dive into Multihead Attention. After that, we will see two different ways of using the attention mechanism, which is Self-Attention and Cross-Attention.

Solution of the exercise:
We have
X: T1xd
Y: T2xd

So, we build Q from Y, so that means Q will be
Q: T2xd

And we build K and V from X, therefore,
K: T1xd
V: T1xd

Then, QK^t (compatibility matrix) will be
QK^t: T2xT1

And the final output Z, will be Softmax(1/sqrt(d) QK^t) * V
Z: T2xd

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

A Dive Into Multihead Attention, Self-Attention and Cross-Attention

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео