Yes you can run LLMs on Kubernetes | Cloud Native Denmark 2025 Aarhus
Автор: Cloud Native Nordics
Загружено: 2025-12-31
Просмотров: 43
Описание:
As LLMs become increasingly powerful and ubiquitous, the need to deploy and scale these models in production environments grows. However, the complexity of LLMs can make them challenging to run reliably and efficiently. In this talk, we'll explore how Kubernetes can be leveraged to run LLMs at scale. We'll cover the key considerations and best practices for packaging LLM inference services as containerized applications using popular OSS inference servers like TGI, vLLM and Ollama, and deploying them on Kubernetes. This includes managing model weights, handling dynamic batching and scaling, implementing advanced traffic routing, and ensuring high availability and fault tolerance. Additionally, we'll discuss accelerators management and serving models on multiple hosts. By the end of this talk, attendees will have a comprehensive understanding of how to successfully run their LLMs on Kubernetes, unlocking the benefits of scalability, resilience, and DevOps-friendly deployments.
Cloud Native Denmark is a premier tech conference where Kubernetes and Cloud Native community comes together for an experience packed with inspiring talks, hands-on workshops, and great opportunities to build professional networks.
🚀 CND Website: https://cloudnativedenmark.dk/
🚀 CND 2025 Conference Archive: https://2025.cloudnativedenmark.dk/
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: