Coding LLaMA 2 from scratch in PyTorch - KV Cache, Grouped Query Attention, Rotary PE, RMSNorm

Автор: Umar Jamil

Загружено: 2023-09-02

Просмотров: 62799

Описание: Full coding of LLaMA 2 from scratch, with full explanation, including Rotary Positional Embedding, RMS Normalization, Multi-Query Attention, KV Cache, Grouped Query Attention (GQA), the SwiGLU Activation function and more!

I explain the most used inference methods: Greedy, Beam Search, Temperature Scaling, Random Sampling, Top K, Top P
I also explain the math behind the Rotary Positional Embedding, with step by step proofs.

Repository with PDF slides: https://github.com/hkproj/pytorch-llama
Download the weights from: https://github.com/facebookresearch/l...

Prerequisites:
1) Transformer explained: • Attention is all you need (Transformer) - ...
2) LLaMA explained: • LLaMA explained: KV-Cache, Rotary Position...

Chapters
00:00:00 - Introduction
00:01:20 - LLaMA Architecture
00:03:14 - Embeddings
00:05:22 - Coding the Transformer
00:19:55 - Rotary Positional Embedding
01:03:50 - RMS Normalization
01:11:13 - Encoder Layer
01:16:50 - Self Attention with KV Cache
01:29:12 - Grouped Query Attention
01:34:14 - Coding the Self Attention
02:01:40 - Feed Forward Layer with SwiGLU
02:08:50 - Model weights loading
02:21:26 - Inference strategies
02:25:15 - Greedy Strategy
02:27:28 - Beam Search
02:31:13 - Temperature
02:32:52 - Random Sampling
02:34:27 - Top K
02:37:03 - Top P
02:38:59 - Coding the Inference

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Coding LLaMA 2 from scratch in PyTorch - KV Cache, Grouped Query Attention, Rotary PE, RMSNorm

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU

LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU

Coding a Transformer from scratch on PyTorch, with full explanation, training and inference.

Coding a Transformer from scratch on PyTorch, with full explanation, training and inference.

LoRA: Low-Rank Adaptation of Large Language Models - Explained visually + PyTorch code from scratch

LoRA: Low-Rank Adaptation of Large Language Models - Explained visually + PyTorch code from scratch

Mistral / Mixtral Explained: Sliding Window Attention, Sparse Mixture of Experts, Rolling Buffer

Mistral / Mixtral Explained: Sliding Window Attention, Sparse Mixture of Experts, Rolling Buffer

Query Key Value | ONLY 7 VIDEOS YOU NEED TO UNDERSTAND Attention is all you need | Part 3

Query Key Value | ONLY 7 VIDEOS YOU NEED TO UNDERSTAND Attention is all you need | Part 3

Объяснение BERT: обучение, вывод, BERT против GPT/LLamA, тонкая настройка, токен [CLS]

Объяснение BERT: обучение, вывод, BERT против GPT/LLamA, тонкая настройка, токен [CLS]

Поворотные позиционные вложения: сочетание абсолютного и относительного

Поворотные позиционные вложения: сочетание абсолютного и относительного

Внимание — это всё, что вам нужно (Transformer) — объяснение модели (включая математику), вывод и...

Внимание — это всё, что вам нужно (Transformer) — объяснение модели (включая математику), вывод и...

Exploring the Latency/Throughput & Cost Space for LLM Inference // Timothée Lacroix // CTO Mistral

Exploring the Latency/Throughput & Cost Space for LLM Inference // Timothée Lacroix // CTO Mistral

Llama 4 Explained: Architecture, Long Context, and Native Multimodality

Llama 4 Explained: Architecture, Long Context, and Native Multimodality

Как строили пирамиды. Сердце пирамид

Как строили пирамиды. Сердце пирамид

Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training

Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training

Distributed Training with PyTorch: complete tutorial with cloud infrastructure and code

Distributed Training with PyTorch: complete tutorial with cloud infrastructure and code

How DeepSeek Rewrote the Transformer [MLA]

How DeepSeek Rewrote the Transformer [MLA]

Цзян Сюэцин: Война с Ираном — перелом, навсегда изменивший Ближний Восток

Цзян Сюэцин: Война с Ираном — перелом, навсегда изменивший Ближний Восток

Получение степени магистра права: создание, обучение, доработка

Получение степени магистра права: создание, обучение, доработка

Решаю задачи с собеседований Python Backend стажёр

Решаю задачи с собеседований Python Backend стажёр

Learn RAG From Scratch – Python AI Tutorial from a LangChain Engineer

Learn RAG From Scratch – Python AI Tutorial from a LangChain Engineer

Визуализация внимания, сердце трансформера | Глава 6, Глубокое обучение

Визуализация внимания, сердце трансформера | Глава 6, Глубокое обучение

Let's build the GPT Tokenizer

Let's build the GPT Tokenizer