ycliper

Популярное

Музыка Кино и Анимация Автомобили Животные Спорт Путешествия Игры Юмор

Интересные видео

2025 Сериалы Трейлеры Новости Как сделать Видеоуроки Diy своими руками

Топ запросов

смотреть а4 schoolboy runaway турецкий сериал смотреть мультфильмы эдисон
Скачать

How DigitalOcean Builds Next-Gen Inference with Ray, vLLM & More | Ray Summit 2025

Автор: Anyscale

Загружено: 2025-12-01

Просмотров: 39

Описание: At Ray Summit 2025, Yogesh Sharma, Boopathy Kannappan, and Debarshi Raha from DigitalOcean share how they built a robust, scalable inference platform for next-generation generative models—powered by Ray and vLLM, running on Kubernetes, and optimized for both serverless and dedicated GPU workloads.

They begin by outlining the rising complexity of inference as models grow in size, context length, and modality. Meeting real-world performance and reliability requirements demands a platform that can scale elastically, manage GPU resources intelligently, and handle dynamic workloads efficiently.

The speakers introduce DigitalOcean’s inference architecture, showing how:

Ray’s scheduling primitives ensure reliable execution across distributed clusters
Placement groups guarantee GPU affinity and predictable performance
Ray observability tools enable deep insight into system health and workload behavior
vLLM provides fast token streaming, optimized batching, and advanced memory/KV-cache management
Serverless and Dedicated Inference Modes

They explore two key operational modes:

Serverless inference for automatic scaling, burst handling, and cost efficiency
Dedicated inference for fine-grained GPU partitioning, custom quantization pipelines, and performance isolation

This dual-mode architecture allows DigitalOcean to serve diverse customer workloads while maintaining reliability and performance under varying traffic patterns.

Advanced Optimization for Long-Context Models

The team then discusses their ongoing initiatives to improve inference for models with contexts exceeding 8k tokens, including:

Dynamic batching by token length
KV cache reuse strategies
Speculative decoding to improve latency and throughput without sacrificing accuracy
Roadmap: Multimodal, Multi-Tenant, and Unified Orchestration
Finally, they present their roadmap for a fully multimodal, multi-tenant inference platform, featuring:

Concurrent model orchestration

Tenant isolation and security-aware billing

A vision for a centralized orchestration layer with Ray as the control plane

A unified model registry for intelligent model placement, prioritization, and lifecycle management

This talk is designed for AI infrastructure engineers building scalable inference systems—whether you're optimizing cutting-edge production stacks or just beginning to architect your own.

Attendees will leave with a clear understanding of how to build future-ready inference platforms capable of serving large, dynamic, multimodal generative models at scale.

Liked this video? Check out other Ray Summit breakout session recordings    • Ray Summit 2025 - Breakout Sessions  

Subscribe to our YouTube channel to stay up-to-date on the future of AI!    / anyscale  

🔗 Connect with us:
LinkedIn:   / joinanyscale  

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...
How DigitalOcean Builds Next-Gen Inference with Ray, vLLM & More | Ray Summit 2025

Поделиться в:

Доступные форматы для скачивания:

Скачать видео

  • Информация по загрузке:

Скачать аудио

Похожие видео

NVIDIA Triton Inference Server and its use in Netflix's Model Scoring Service

NVIDIA Triton Inference Server and its use in Netflix's Model Scoring Service

High-Throughput Inference for Synthetic Data & Evals at Sutro | Ray Summit 2025

High-Throughput Inference for Synthetic Data & Evals at Sutro | Ray Summit 2025

Odin Lang : Learning Vlog 11: Programming A Button

Odin Lang : Learning Vlog 11: Programming A Button

Contextual + Ray: Boosting SFT, RL & Inference at Scale | Ray Summit 2025

Contextual + Ray: Boosting SFT, RL & Inference at Scale | Ray Summit 2025

🔀 Фронтендеры не знают Web API: OPFS, Web Crypto, Websocket, WebRTC, Locks, Workers, Cache API...

🔀 Фронтендеры не знают Web API: OPFS, Web Crypto, Websocket, WebRTC, Locks, Workers, Cache API...

Getting Started with Inference Using vLLM

Getting Started with Inference Using vLLM

Объяснение бессерверных вычислений

Объяснение бессерверных вычислений

BODYBUILDERS VS CLEANER  | Anatoly GYM PRANK #56

BODYBUILDERS VS CLEANER | Anatoly GYM PRANK #56

GraphRAG: союз графов знаний и RAG: Эмиль Эйфрем

GraphRAG: союз графов знаний и RAG: Эмиль Эйфрем

Firecrawl + MCP-сервер в n8n: Забудь про сложный парсинг и скрапинг! Идеальный AI агент

Firecrawl + MCP-сервер в n8n: Забудь про сложный парсинг и скрапинг! Идеальный AI агент

Scaling Production LLM Inference Using EKS Auto Mode & Ray Serve | Ray Summit 2025

Scaling Production LLM Inference Using EKS Auto Mode & Ray Serve | Ray Summit 2025

Объяснение контейнеров

Объяснение контейнеров

Prompt Learning: A Reinforcement Learning-Inspired Approach to AI Optimization | Ray Summit 2025

Prompt Learning: A Reinforcement Learning-Inspired Approach to AI Optimization | Ray Summit 2025

КАК НЕЛЬЗЯ ХРАНИТЬ ПАРОЛИ (и как нужно) за 11 минут

КАК НЕЛЬЗЯ ХРАНИТЬ ПАРОЛИ (и как нужно) за 11 минут

Kubernetes — Простым Языком на Понятном Примере

Kubernetes — Простым Языком на Понятном Примере

Краткий обзор новой версии n8n 2.0  🚀

Краткий обзор новой версии n8n 2.0 🚀

Building your first production-ready AI agent with Amazon Bedrock AgentCore | AWS Show & Tell

Building your first production-ready AI agent with Amazon Bedrock AgentCore | AWS Show & Tell

Storage и FS - что подходит для enterprise

Storage и FS - что подходит для enterprise

How Runhouse Orchestrates Multi-Cluster Ray Workloads | Ray Summit 2025

How Runhouse Orchestrates Multi-Cluster Ray Workloads | Ray Summit 2025

Гайд по созданию Telegram бота на Python + деплой

Гайд по созданию Telegram бота на Python + деплой

© 2025 ycliper. Все права защищены.



  • Контакты
  • О нас
  • Политика конфиденциальности



Контакты для правообладателей: [email protected]