New Pattern for Sailing Multi-host LLM Inference - Kante Yin, DaoCloud

Автор: CNCF [Cloud Native Computing Foundation]

Загружено: 2025-06-13

Просмотров: 61

Описание: Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon India in Hyderabad (August 6-7), and KubeCon + CloudNativeCon North America in Atlanta (November 10-13). Connect with our current graduated, incubating, and sandbox projects as the community gathers to further the education and advancement of cloud native computing. Learn more at https://kubecon.io

New Pattern for Sailing Multi-host LLM Inference - Kante Yin, DaoCloud

Inference workloads are becoming increasingly prevalent and vital in Cloud Native world. However, it's not easy, one of the biggest challenges is large foundation model can not fit into a single node, like llama 3.1-405B or DeepSeek R1, which brings out the distributed inference with model parallelism, again, make serving inference workloads more complicated.

LeaderWorkerSet, aka. LWS, is a dedicated multi-host inference project aims to solve this problem, it's a project under the guidance of Kubernetes SIG-Apps and Serving Working Group. It offers a couple of features like dual-template for different types of Pods, fine-gained rolling update strategies, topology managements and all-or-nothing failure handlings.

In this session, we'll introduce the capacities of lws and showcase the practice from our adopters like nvidia, google, and we'll demonstrate the integration with the most popular inference engines, such as vLLM, SGLang.

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

New Pattern for Sailing Multi-host LLM Inference - Kante Yin, DaoCloud

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Simplifying the Networking and Security Stack With Cilium, Hubble, and Tet... Liyi Huang & Kaixi Fan

Simplifying the Networking and Security Stack With Cilium, Hubble, and Tet... Liyi Huang & Kaixi Fan

The Life of a Span - Yuri Oliveira, OllyGarden & Jamie Danielson, Honeycomb

The Life of a Span - Yuri Oliveira, OllyGarden & Jamie Danielson, Honeycomb

Huawei Shines at PLDI 2025 with Cangjie Programming Language Breakthroughs

Huawei Shines at PLDI 2025 with Cangjie Programming Language Breakthroughs

I'm done with the AI hype

I'm done with the AI hype

Argo CD for Mission-Critical Deployments-Efraim Glatt & Eldad Ambar, Tel Aviv Stock Exchange

Argo CD for Mission-Critical Deployments-Efraim Glatt & Eldad Ambar, Tel Aviv Stock Exchange

Building Resilient Telemetry Pipelines: Mastering the OpenTelemetry Collector's Per... Denton Krietz

Building Resilient Telemetry Pipelines: Mastering the OpenTelemetry Collector's Per... Denton Krietz

How To Think About Instrumentation Overhead - Jason Plumb, Splunk

How To Think About Instrumentation Overhead - Jason Plumb, Splunk

Но что такое нейронная сеть? | Глава 1. Глубокое обучение

Но что такое нейронная сеть? | Глава 1. Глубокое обучение

ШУЛЬМАН: На фронт отправят всех подряд. ФСБ возродит ГУЛАГ. Ускорение репрессий. Борьба с мигрантами

ШУЛЬМАН: На фронт отправят всех подряд. ФСБ возродит ГУЛАГ. Ускорение репрессий. Борьба с мигрантами

Я СДЕЛАЛ ИДЕАЛЬНЫЙ ШАР ИЗ ОБЫЧНОЙ ЗЕМЛИ - ДРЕВНЯЯ ЯПОНСКАЯ ТЕХНИКА

Я СДЕЛАЛ ИДЕАЛЬНЫЙ ШАР ИЗ ОБЫЧНОЙ ЗЕМЛИ - ДРЕВНЯЯ ЯПОНСКАЯ ТЕХНИКА