#234

Автор: Data Science Gems

Загружено: 2024-12-26

Просмотров: 739

Описание: Foundation models are applied in a broad spectrum of settings with different inference constraints, from massive multi-accelerator clusters to resource-constrained standalone mobile devices. However, the substantial costs associated with training these models often limit the number of unique model sizes that can be offered. Consequently, practitioners are compelled to select a model that may not be optimally aligned with their specific latency and cost requirements. MatFormer is a novel Transformer architecture designed to provide elastic inference across diverse deployment constraints. MatFormer achieves this by incorporating a nested Feed Forward Network (FFN) block structure within a standard Transformer model. During training, the parameters of multiple nested FFN blocks are optimized with varying sizes, enabling the extraction of hundreds of accurate smaller models without incurring additional computational costs. Efficacy of MatFormer is validated across different model classes (decoders and encoders) and modalities (language and vision), demonstrating its potential for real-world deployment. A 850M decoder-only MatFormer language model (MatLM) allows us to extract multiple smaller models spanning from 582M to 850M parameters, each exhibiting better validation loss and one-shot downstream evaluations than independently trained counterparts. Furthermore, smaller encoders extracted from a universal MatFormer-based ViT (MatViT) encoder preserve the metric-space structure for adaptive large-scale retrieval. Finally, speculative decoding with the accurate and consistent submodels extracted from MatFormer can lead to significant reduction in inference latency.

In this video, I talk about the following: How are the MatFormer models trained? How does MatFormer perform?

For more details, please look at https://arxiv.org/pdf/2310.07707 and https://github.com/devvrit/matformer

Kudugunta, Sneha, Aditya Kusupati, Tim Dettmers, Kaifeng Chen, Inderjit Dhillon, Yulia Tsvetkov, Hannaneh Hajishirzi, Sham Kakade, Ali Farhadi, and Prateek Jain. "Matformer: Nested transformer for elastic inference." arXiv preprint arXiv:2310.07707 (2023).

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

#234

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

#235 ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction

#235 ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction

Как происходит модернизация остаточных соединений [mHC]

Как происходит модернизация остаточных соединений [mHC]

Управление поведением LLM без тонкой настройки

Управление поведением LLM без тонкой настройки

Новое инженерное решение - неограниченный контекст и предсказуемые рассуждения - Recursive LM.

Новое инженерное решение - неограниченный контекст и предсказуемые рассуждения - Recursive LM.

LLM fine-tuning или ОБУЧЕНИЕ малой модели? Мы проверили!

LLM fine-tuning или ОБУЧЕНИЕ малой модели? Мы проверили!

#295 Ограниченное внимание к студентам магистратуры

#295 Ограниченное внимание к студентам магистратуры

Трещины в сфере ИИ расширяются (CoT, RAG)

Трещины в сфере ИИ расширяются (CoT, RAG)

ИИ - ЭТО ИЛЛЮЗИЯ ИНТЕЛЛЕКТА. Но что он такое и почему совершил революцию?

ИИ - ЭТО ИЛЛЮЗИЯ ИНТЕЛЛЕКТА. Но что он такое и почему совершил революцию?

Мы стоим на пороге нового конфликта! Что нас ждет дальше? Андрей Безруков про США, Россию и кризис

Мы стоим на пороге нового конфликта! Что нас ждет дальше? Андрей Безруков про США, Россию и кризис

Почему «Трансформеры» заменяют CNN?

Почему «Трансформеры» заменяют CNN?

The Hairy Ball Theorem

The Hairy Ball Theorem

КАК Япония Незаметно СТАЛА Мировой Станкостроительной ДЕРЖАВОЙ!

КАК Япония Незаметно СТАЛА Мировой Станкостроительной ДЕРЖАВОЙ!

Как внимание стало настолько эффективным [GQA/MLA/DSA]

Как внимание стало настолько эффективным [GQA/MLA/DSA]

Румынская математическая олимпиада

Румынская математическая олимпиада

Краткое объяснение больших языковых моделей

Краткое объяснение больших языковых моделей

ChatGPT продает ваши чаты, Anthropic создает цифровых существ, а Маск как всегда…

ChatGPT продает ваши чаты, Anthropic создает цифровых существ, а Маск как всегда…

Почему работает теория шести рукопожатий? [Veritasium]

Почему работает теория шести рукопожатий? [Veritasium]

#303 Тренировка во время тестирования

#303 Тренировка во время тестирования

Обвал цен на 90%, изменивший всё.

Обвал цен на 90%, изменивший всё.

21 неожиданный способ использовать Gemini в повседневной жизни

21 неожиданный способ использовать Gemini в повседневной жизни