GPU MODE
A GPU reading group and community https://discord.gg/gpumode
Supplementary content here https://github.com/gpu-mode
Created by Mark Saroufim and Andreas Köpf
Lecture 83: Formalized Kernel Derivation
Lecture 82 Helion: A high-level DSL for ML kernels
Lecture 81: High-performance purely functional data-parallel array programming
Lecture 80: How FlashAttention 4 Works
Lecture 79 Mirage (MPK): Compiling LLMs into Mega Kernels
Lecture 78 Iris: Multi-GPU Programming in Triton
Лекция 77: Предметно-ориентированные языки для ядер графических процессоров
Lecture 76: BackendBench fixing the LLM kernel correctness problem
Lecture 75 [ScaleML Series] GPU Programming Fundamentals + ThunderKittens
Lecture 74: [ScaleML Series] Positional Encodings and PaTH Attention
Lecture 73: [ScaleML Series] Quantization in Large Models
Лекция 72: [Серия ScaleML] Эффективное и действенное моделирование в длинном контексте для больши...
Lecture 71: [ScaleML Series] FlexOlmo: Open Language Models for Flexible Data Use
Lecture 70: PCCL Fault tolerant collectives
Lecture 68: Landscape of GPU Centric communication
Lecture 69: Quartet 4 bit training
Lecture 67: NCCL and NVSHMEM
Lecture 66: Game Arena
Lecture 65: Neighborhood Attention
Lecture 64: Multi-GPU programming
Lecture 63: Search-Based Deep Learning Compilers
Lecture 62: Exo 2 Growing a scheduling language
Lecture 61: D-Matrix Corsair
Lecture 60: Optimizing Linear Attention
Lecture 59: FastVideo
Lecture 58: Disaggregated LLM Inference
Lecture 57: CuTe
Lecture 56: Kernel Benchmarking Tales
Lecture 55: Modular’s unified device accelerator language
Lecture 54: Small RL Models at the Speed of Light with LeanRL