Lightning Talk: Introducing LLM Instance Gateways for Efficient I... Abdel Sghiouar & Daneyon Hansen

Автор: CNCF [Cloud Native Computing Foundation]

Загружено: 2025-04-17

Просмотров: 400

Описание: Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon events in Hong Kong, China (June 10-11); Tokyo, Japan (June 16-17); Hyderabad, India (August 6-7); Atlanta, US (November 10-13). Connect with our current graduated, incubating, and sandbox projects as the community gathers to further the education and advancement of cloud native computing. Learn more at https://kubecon.io

Lightning Talk: Introducing LLM Instance Gateways for Efficient Inference Serving - Abdel Sghiouar, Google Cloud & Daneyon Hansen, solo.io

Large Language Models (LLMs) are revolutionizing applications, but efficiently serving them in production is a challenge. Existing API endpoints, LoadBalancers and Gateways focus on HTTP/gRPC traffic which is a well defined space already. LLM traffic is completely different as an input to an LLM is usually characterized by the size of the prompt, the size and efficiency of the model...etc

Why are LLM Instance Gateways important? They solve the problem of efficiently managing and serving multiple LLM use cases with varying demands on shared infrastructure.

What will you learn? The core challenges of LLM inference serving: Understand the complexities of deploying and managing LLMs in production, including resource allocation, traffic management, and performance optimization.

We will dive into how LLM Instance Gateways work, how they route requests, manage resources, and ensure fairness among different LLM use cases.

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Lightning Talk: Introducing LLM Instance Gateways for Efficient I... Abdel Sghiouar & Daneyon Hansen

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Serving the Future: KServe’s Next Chapter Hosting LLMs & GenAI Models... Alexa Griffith & Tessa Pham

Serving the Future: KServe’s Next Chapter Hosting LLMs & GenAI Models... Alexa Griffith & Tessa Pham

Harnessing the Power of Envoy Proxy for Building an LLM Gateway - Idit Levine, Solo.io

Harnessing the Power of Envoy Proxy for Building an LLM Gateway - Idit Levine, Solo.io

Istio Project Updates: AI Inference, Ambient Multicluster & Default Deny - Keith Mattix, Microsoft

Istio Project Updates: AI Inference, Ambient Multicluster & Default Deny - Keith Mattix, Microsoft

AI Inference Without Boundaries: Dynamic Routing With Multi-Cluster In... Rob Scott & Daneyon Hansen

AI Inference Without Boundaries: Dynamic Routing With Multi-Cluster In... Rob Scott & Daneyon Hansen

39c3 1743 eng deu Learning from South Korean Telco Breaches sd

39c3 1743 eng deu Learning from South Korean Telco Breaches sd

Экспресс-курс RAG для начинающих

Экспресс-курс RAG для начинающих

Frontier Models & AI | Sam Altman, CEO & Co-Founder, OpenAI

Frontier Models & AI | Sam Altman, CEO & Co-Founder, OpenAI

Kubernetes — Простым Языком на Понятном Примере

Kubernetes — Простым Языком на Понятном Примере

Scaling AI Agents in Kubernetes: MCP and A2A with agentgateway and kgateway

Scaling AI Agents in Kubernetes: MCP and A2A with agentgateway and kgateway

Мы стоим на пороге нового конфликта! Что нас ждет дальше? Андрей Безруков про США, Россию и кризис

Мы стоим на пороге нового конфликта! Что нас ждет дальше? Андрей Безруков про США, Россию и кризис

12-факторные агенты: модели надежных приложений LLM — Декс Хорти, HumanLayer

12-факторные агенты: модели надежных приложений LLM — Декс Хорти, HumanLayer

OpenAI Is Slowing Hiring. Anthropic's Engineers Stopped Writing Code. Here's Why You Should Care.

OpenAI Is Slowing Hiring. Anthropic's Engineers Stopped Writing Code. Here's Why You Should Care.

Почему Польша купила тысячу корейских танков вместо Абрамсов и Леопардов?

Почему Польша купила тысячу корейских танков вместо Абрамсов и Леопардов?

Почему MCP действительно важен | Модель контекстного протокола с Тимом Берглундом

Почему MCP действительно важен | Модель контекстного протокола с Тимом Берглундом

Сисадмины больше не нужны? Gemini настраивает Linux сервер и устанавливает cтек N8N. ЭТО ЗАКОННО?

Сисадмины больше не нужны? Gemini настраивает Linux сервер и устанавливает cтек N8N. ЭТО ЗАКОННО?

Превратите ЛЮБОЙ файл в знания LLM за СЕКУНДЫ

Превратите ЛЮБОЙ файл в знания LLM за СЕКУНДЫ

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

Access AI Models Anywhere: Scaling AI Traffic With Envoy AI Gateway - Dan Sun & Takeshi Yoneda

Access AI Models Anywhere: Scaling AI Traffic With Envoy AI Gateway - Dan Sun & Takeshi Yoneda

Музыка для работы за компьютером | Фоновая музыка для концентрации и продуктивности

Музыка для работы за компьютером | Фоновая музыка для концентрации и продуктивности

Как начать работать с Obsidian ПРАВИЛЬНО (Гайд для новичков)

Как начать работать с Obsidian ПРАВИЛЬНО (Гайд для новичков)