Wonmin Byeon (NVIDIA), "An Alternative Architecture for Efficient Large Language Models (LLMs)"

Автор: Users & Information Lab KAIST

Загружено: 2024-07-19

Просмотров: 260

Описание: Paper: An Empirical Study of Mamba-based Language Models (https://arxiv.org/abs/2406.07887)

Widely used Large Language Models (LLMs) are based on Transformer architectures. While Transformer-based language models are highly parallelizable and can model massive amounts of data, they introduce significant computational overhead due to the quadratic self-attention calculations, especially on longer sequences. They also have large inference-time memory requirements from the key-value cache. More recently, State Space Models (SSM) like Mamba have been shown to have fast parallelizable training and inference. Studies show that SSMs can match or exceed the language modeling capabilities of Transformers, making them an attractive alternative. In this talk, I present the strengths and weaknesses of Mamba, Mamba-2, and Transformer models at larger scales. I also introduce a hybrid architecture consisting of Mamba-2, attention, and MLP layers. While pure SSMs match or exceed Transformers on many tasks, they lag behind Transformers on tasks that require strong copying or in-context learning abilities. In contrast, the hybrid model closely matches or exceeds the Transformer on all standard and long-context tasks and is predicted to be up to 8x faster when generating tokens at inference time.

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Wonmin Byeon (NVIDIA), "An Alternative Architecture for Efficient Large Language Models (LLMs)"

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

[ERC AI Seminar] Kyunghyun Cho,

[ERC AI Seminar] Kyunghyun Cho, "Learned Data Augmentation in Natural Language Processing"

LMCache Office Hour 2025 01 08

LMCache Office Hour 2025 01 08

Google Skills Arcade Trivia January 2026 Week 2 quiz amswer | Google arcade 2026 | Gcp #gcp #swags

Google Skills Arcade Trivia January 2026 Week 2 quiz amswer | Google arcade 2026 | Gcp #gcp #swags

DNABERT pre-trained Bidirectional Encoder Representations Transformers for DNA-language in genome

DNABERT pre-trained Bidirectional Encoder Representations Transformers for DNA-language in genome

Computational Devices and Applications of 2D Materials

Computational Devices and Applications of 2D Materials

Making DRAM Available Again at Bilkent University

Making DRAM Available Again at Bilkent University

BioReason: Biological Reasoning within a DNA-LLM Model | Adib Fallahpour | HMAI Speaker Series #5

BioReason: Biological Reasoning within a DNA-LLM Model | Adib Fallahpour | HMAI Speaker Series #5

FERRAN ŁAMIE KOD, A YAMAL GASI ŚWIATŁO! CZY ONI JESZCZE KIEDYŚ PRZEGRAJĄ? | SKRÓT

FERRAN ŁAMIE KOD, A YAMAL GASI ŚWIATŁO! CZY ONI JESZCZE KIEDYŚ PRZEGRAJĄ? | SKRÓT

Energy Is Not a Thing — It’s the Universe’s Most Perfect Accounting Rule

Energy Is Not a Thing — It’s the Universe’s Most Perfect Accounting Rule

Mrozu feat. Julia Pietrucha - Anioły (Pojedynek - official promo video)

Mrozu feat. Julia Pietrucha - Anioły (Pojedynek - official promo video)

Stop Cham #1403 - Niebezpieczne i chamskie sytuacje na drogach

Stop Cham #1403 - Niebezpieczne i chamskie sytuacje na drogach

Prawdziwy Powód, Dlaczego Psy CIĘ LIŻĄ (Szokujące!)

Prawdziwy Powód, Dlaczego Psy CIĘ LIŻĄ (Szokujące!)

Wyjaśniamy o co chodzi z Grenlandią. Czy naprawdę może wybuchnąć wojna USA-Dania?

Wyjaśniamy o co chodzi z Grenlandią. Czy naprawdę może wybuchnąć wojna USA-Dania?

Cała prawda o Danii! Miśko: To co robili na Grenlandii było straszne!

Cała prawda o Danii! Miśko: To co robili na Grenlandii było straszne!

The U.S. Didn’t Invade Venezuela for Oil — This Is the Return to Imperialism

The U.S. Didn’t Invade Venezuela for Oil — This Is the Return to Imperialism

Ahmed Al-Ahmed: Watts and Drops: Joint Scheduling of Power and Water in Desalination Plants

Ahmed Al-Ahmed: Watts and Drops: Joint Scheduling of Power and Water in Desalination Plants

Fields: fabric of reality or math toy or is the word a homonym?

Fields: fabric of reality or math toy or is the word a homonym?

LEMAS Seminar by Professor Jorge Poveda (UCSD) on Deception and Incentives in Multi-Agent Learning

LEMAS Seminar by Professor Jorge Poveda (UCSD) on Deception and Incentives in Multi-Agent Learning

A framework for integrating digitised engineering charts into AI-driven geotechnical applications

A framework for integrating digitised engineering charts into AI-driven geotechnical applications

Single cell and spatial alternative splicing analysis with Nanopore long read sequencing

Single cell and spatial alternative splicing analysis with Nanopore long read sequencing