ycliper

Популярное

Музыка Кино и Анимация Автомобили Животные Спорт Путешествия Игры Юмор

Интересные видео

2025 Сериалы Трейлеры Новости Как сделать Видеоуроки Diy своими руками

Топ запросов

смотреть а4 schoolboy runaway турецкий сериал смотреть мультфильмы эдисон
Скачать

Hardware-aware Algorithms for Sequence Modeling - Tri Dao | Stanford MLSys #87

Автор: Stanford MLSys Seminars

Загружено: 2024-01-17

Просмотров: 7560

Описание: Episode 87 of the Stanford MLSys Seminar Series!

Hardware-aware Algorithms for Sequence Modeling
Speaker: Tri Dao

Abstract:
Transformers are slow and memory-hungry on long sequences, since the time and memory complexity of self-attention are quadratic in sequence length.
In the first half, we describe attention approximation algorithms using sparsity and low-rank structures, as well as algorithms (e.g. FlashAttention) to achieve fast and memory-efficient exact attention. By making attention algorithms IO-aware (accounting for reads and writes between levels of GPU memory) one can speed up attention by 4-8x, enabling 4-16x longer context in Transformers and yielding higher quality models. We will also describe optimizations for long-context LLM inference, leading to 2-8x faster end-to-end inference time.
In the second half, we describe recent progress on subquadratic-time architectures such as RNNs, gated convolution, and structured state space models (SSMs). We identify that a key weakness of such models is their inability to perform content-based reasoning, and propose a selection mechanism to address this shortcoming. Though this change prevents the use of efficient convolutions, we design a hardware-aware parallel algorithm in recurrent mode. We integrate these selective SSMs into a simplified end-to-end neural network architecture (Mamba) without attention or even MLP blocks. Mamba matches or exceeds the performance of strong modern Transformers on language modeling.

Bio:
Tri Dao is an incoming Assistant Professor at Princeton University and is currently chief scientist of Together AI. He completed his PhD in Computer Science at Stanford, co-advised by Christopher Ré and Stefano Ermon. He works at the intersection of machine learning and systems, and his research interests include sequence models with long-range memory and structured matrices for compact deep learning models. His work has received the ICML 2022 Outstanding paper runner-up award.

--

Stanford MLSys Seminar hosts: Avanika Narayan, Benjamin Spector, Michael Zhang

Twitter:
  / avanika15​  
  / bfspector  
  / mzhangio  

--

Check out our website for the schedule: http://mlsys.stanford.edu
Join our mailing list to get weekly updates: https://groups.google.com/forum/#!for...

#machinelearning #ai #artificialintelligence #systems #mlsys #computerscience #stanford

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...
Hardware-aware Algorithms for Sequence Modeling - Tri Dao | Stanford MLSys #87

Поделиться в:

Доступные форматы для скачивания:

Скачать видео

  • Информация по загрузке:

Скачать аудио

Похожие видео

Notes on AI Hardware - Benjamin Spector | Stanford MLSys #88

Notes on AI Hardware - Benjamin Spector | Stanford MLSys #88

Monarch Mixer: Making Foundation Models More Efficient - Dan Fu | Stanford MLSys #86

Monarch Mixer: Making Foundation Models More Efficient - Dan Fu | Stanford MLSys #86

Я Знаю Каким Будет Будущее (Успех меняется)

Я Знаю Каким Будет Будущее (Успех меняется)

Text2SQL: The Dream versus Reality - Laurel Orr | Stanford MLSys #89

Text2SQL: The Dream versus Reality - Laurel Orr | Stanford MLSys #89

Stanford AI Club: Jeff Dean on Important AI Trends

Stanford AI Club: Jeff Dean on Important AI Trends

Следующий 100x — Гэвин Уберти | Stanford MLSys #92

Следующий 100x — Гэвин Уберти | Stanford MLSys #92

The Physicist Who Puts Penrose’s Quantum Ideas To The Test | Ivette Fuentes

The Physicist Who Puts Penrose’s Quantum Ideas To The Test | Ivette Fuentes

Stanford CS230 | Autumn 2025 | Lecture 9: Career Advice in AI

Stanford CS230 | Autumn 2025 | Lecture 9: Career Advice in AI

The biggest misconception in Einstein's relativity

The biggest misconception in Einstein's relativity

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 1 - Transformer

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 1 - Transformer

EVO: DNA Foundation Models - Eric Nguyen | Stanford MLSys #96

EVO: DNA Foundation Models - Eric Nguyen | Stanford MLSys #96

Poetiq - Ian Fischer (CEO) | Stanford Hidden Layer Podcast #104

Poetiq - Ian Fischer (CEO) | Stanford Hidden Layer Podcast #104

Оригинальная конструкция тензодатчиков

Оригинальная конструкция тензодатчиков

Граница вычислений

Граница вычислений

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 9 - Recap & Current Trends

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 9 - Recap & Current Trends

General Relativity Lecture 1

General Relativity Lecture 1

Teaching LLMs to Use Tools at Scale - Shishir Patil | Stanford MLSys #98

Teaching LLMs to Use Tools at Scale - Shishir Patil | Stanford MLSys #98

Stanford CS230 | Autumn 2025 | Lecture 1: Introduction to Deep Learning

Stanford CS230 | Autumn 2025 | Lecture 1: Introduction to Deep Learning

Настоящая причина, по которой полупроводники отличаются от проводников и изоляторов.

Настоящая причина, по которой полупроводники отличаются от проводников и изоляторов.

Математическая тревожность, нейросети, задачи тысячелетия / Андрей Коняев

Математическая тревожность, нейросети, задачи тысячелетия / Андрей Коняев

© 2025 ycliper. Все права защищены.



  • Контакты
  • О нас
  • Политика конфиденциальности



Контакты для правообладателей: [email protected]