The future of AI: Distributing inference beyond a few GPUs
Автор: Red Hat
Загружено: 2025-07-30
Просмотров: 1549
Описание:
How do you run an AI model with a million-token context? 🕸️ Chris Wright and Nick Hill discuss the future of AI scaling, covering distributed inference, splitting tasks across different hardware, and the challenge of compressing the KV cache for massive models.
Explore the future of enterprise AI in the full Technically Speaking episode, now on YouTube!
#DistributedInference #LLM #AI #vLLM #llm-d #RedHat
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: