llm d NYC 2026 Meetup

Автор: llm-d Project

Загружено: 2026-03-12

Просмотров: 58

Описание: Welcome to the recording of the first-ever llm-d Meetup, hosted on March 11, 2026, in New York City! This inaugural event brought together engineering leaders from IBM Research, AMD, and Red Hat to dive deep into the challenges of scaling LLM inference and the future of the open-source distributed stack.

In this session, we explore how llm-d (an open-source, full-stack solution) is establishing distributed inference as a first-class cloud-native workload. From managing the "prefill crunch" to state-aware scheduling on Kubernetes, our speakers break down the technical paths to production-ready AI.

📍 AGENDA & TIMESTAMPS
00:00 Welcome - Pete Cheslock (Red Hat)
01:49 Intro to llm-d for Open Source Distributed Inference - Carlos Costa (IBM)
35:40 Distributed LLM Serving on AMD with llm-d - Kenny Roche (AMD)
1:05:55 Scaling Wide-EP and Mixture-of-Experts (MoE) Models - Tyler Smith (Red Hat AI)
1:20:59 KV-Cache Wins: Prefix-Cache Scheduling & Offloading - Maroon Ayoub (IBM)
1:41:54 Closing & How to Get Involved with llm-d - Pete Cheslock

Carlos Costa (IBM Research) kicks off with an overview of the core challenges: hardware heterogeneity, varying request sizes, and the shift from monolithic to orchestrated inference.

Kenny Roche (AMD) discuss aligning llm-d with the ROCm stack and the performance potential of the ADER version of kernels.

Tyler Smith (Red Hat AI) dive into Expert Parallelism (EP) and lessons learned scaling sparse models like DeepSeek-style architectures.

1:05:10 KV-Cache Wins: Prefix-Cache Scheduling & Offloading
Maroon Ayoub (IBM Research) explains why KV cache hit rates are the most important metric for production and introduces North-South/East-West management paths.

💡 KEY TECHNICAL HIGHLIGHTS

State-Aware Scheduling: Learn how llm-d achieves significantly faster performance by optimizing KV cache reuse across clusters.
Prefill-Decode (PND) Disaggregation: A deep dive into separating compute-bound prefill from memory-bound decode for better latency.
Offloading Strategies: How to overcome GPU memory limits using CPU and file system-based storage offloading for terabytes of KV cache.
Future Frontiers: A sneak peek at the llm-d roadmap, featuring reinforcement learning (RL) support and expansion to the SGLang inference engine.

🔗 JOIN THE COMMUNITY

Join the llm-d community:
🌎 https://llm-d.ai
💬 https://llm-d.ai/slack
💻 https://github.com/llm-d

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

llm d NYC 2026 Meetup

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Как Ubuntu Предала Linux - Вся Правда о Взлёте и Падении Canonical

Как Ubuntu Предала Linux - Вся Правда о Взлёте и Падении Canonical

House-ah Lengpui Airport ram hralh chungchang an sawiho ta e! An sawi buai nuai tawh mai😱

House-ah Lengpui Airport ram hralh chungchang an sawiho ta e! An sawi buai nuai tawh mai😱

Best of Deep House [2026] | Melodic House & Progressive Flow

Best of Deep House [2026] | Melodic House & Progressive Flow

PUBLIC llm d Community Meeting 2026 02 18 12 26 EST Recording

PUBLIC llm d Community Meeting 2026 02 18 12 26 EST Recording

Самоудар по яйцам ЧУВИ | Фейк процессор в Chuwi CoreBook X

Самоудар по яйцам ЧУВИ | Фейк процессор в Chuwi CoreBook X

PUBLIC llm d Community Meeting 2026 02 04 12 25 EST Recording

PUBLIC llm d Community Meeting 2026 02 04 12 25 EST Recording

Hysteria 2 в 2026: что может протокол, который притворяется HTTP/3, после ударов по VLESS

Hysteria 2 в 2026: что может протокол, который притворяется HTTP/3, после ударов по VLESS

PUBLIC llm d Community Meeting 2025 12 10 12 23 EST Recording

PUBLIC llm d Community Meeting 2025 12 10 12 23 EST Recording

Новый язык программирования для эпохи ИИ

Новый язык программирования для эпохи ИИ

PUBLIC llm d Community Meeting 2026 01 21 12 21 EST Recording

PUBLIC llm d Community Meeting 2026 01 21 12 21 EST Recording

DKT91: Мок-интервью DevOps - Архитектура AWS, Terraform и Live Debug K8s

DKT91: Мок-интервью DevOps - Архитектура AWS, Terraform и Live Debug K8s

Дороничев: ИИ — пузырь, который скоро ЛОПНЕТ. Какие перемены ждут мир?

Дороничев: ИИ — пузырь, который скоро ЛОПНЕТ. Какие перемены ждут мир?

США готовит жесткую мобилизацию и СНИМАЕТ САНКЦИИ С РОССИИ!

США готовит жесткую мобилизацию и СНИМАЕТ САНКЦИИ С РОССИИ!

ДЕНЬ 1478: МОСКВА ПАРАЗИТИРУЕТ НА ВОЙНЕ @Курбанова LIVE

ДЕНЬ 1478: МОСКВА ПАРАЗИТИРУЕТ НА ВОЙНЕ @Курбанова LIVE

Когда и почему сатана появился в Библии?

Когда и почему сатана появился в Библии?

Politics Chat, March 12, 2026

Politics Chat, March 12, 2026

КЛАССИЧЕСКАЯ МУЗЫКА ДЛЯ ВОССТАНОВЛЕНИЯ НЕРВНОЙ СИСТЕМЫ🌿 Нежная музыка успокаивает нервную систему 22

КЛАССИЧЕСКАЯ МУЗЫКА ДЛЯ ВОССТАНОВЛЕНИЯ НЕРВНОЙ СИСТЕМЫ🌿 Нежная музыка успокаивает нервную систему 22

Лучший документальный фильм про создание ИИ

Лучший документальный фильм про создание ИИ

Запуск нейросетей локально. Генерируем - ВСЁ

Запуск нейросетей локально. Генерируем - ВСЁ

Как вредит смартфону беспроводная зарядка? + НОВОСТИ!

Как вредит смартфону беспроводная зарядка? + НОВОСТИ!