Applied Deep Learning – Class 43 | Self Attention Mathematical Formula
Автор: gened
Загружено: 2026-02-19
Просмотров: 2
Описание:
In this session of Applied Deep Learning, we explore the mathematical formula of self-attention as presented in the “Attention Is All You Need” paper.
This lecture is theory-only and focuses on deriving and understanding the core equations that make self-attention work in transformer models.
📚 In this lecture, we cover:
🔹 The Self-Attention Equation
We break down the fundamental formula from the paper:
Attention(Q, K, V) = softmax((Q · Kᵀ) / √dₖ) · V
…and explain what each term means, why the scaling factor √dₖ matters, and how softmax transforms similarity scores into attention weights.
🔹 Why This Formula Works
Learn how:
✔ Queries compare with keys to produce relevance scores
✔ Scaling prevents overly large gradients
✔ Softmax transforms scores into probabilities
✔ Weighted values produce contextualized outputs
🔹 Intuition Behind Each Step
Rather than just memorizing equations, we explain the meaning behind them — how words in a sentence attend to each other, how attention weights are computed, and how output vectors are formed.
🔹 Connection to Transformers
This formula is the centerpiece of:
✔ Self-Attention
✔ Scaled Dot-Product Attention
✔ The entire Transformer architecture
This session gives you the mathematical grounding necessary before moving to Multi-Head Attention and full Transformer implementation.
📂 Notebook Link:
https://github.com/GenEd-Tech/Applied...
👍 Like, Share & Subscribe for more AI, Deep Learning & NLP content
💬 Comment if you want the next session on Multi-Head Attention
#DeepLearning #SelfAttention #MathOfAttention #Transformer #NLP #MachineLearning #AI #AppliedDeepLearning
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: