vLLM: High-performance serving of LLMs using open-source technology
Автор: AI Infra Forum
Загружено: 2025-03-13
Просмотров: 1272
Описание: Research Scientist Thomas Parnell of IBM provides an overview of vLLM, an open-source project providing high performance inference and serving of large language models (LLMs). At IBM, we use vLLM extensively in production, are and active contributors to the project. In this talk, I'll start by providing a high-level overview of vLLM, its key technical capabilities, and the community that has grown around it. I'll then cover some recent trends in LLMs and their usage (long context, agents, test-time scaling, diverse hardware), and how vLLM is evolving to support these new use-cases.
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: