Inside the Voyage AI Platform | MongoDB.local San Francisco 2026
Автор: MongoDB
Загружено: 2026-02-04
Просмотров: 327
Описание:
Watch more from .local San Francisco → • MongoDB.local San Francisco 2026
Subscribe to MongoDB YouTube→ https://mdb.link/subscribe
This talk takes you inside the Voyage Serving Platform, exploring how routing, indexing, and query optimizations deliver low-latency, high-reliability inference for embeddings and reranking models at scale. You’ll learn the key design principles behind these systems, see real examples of performance optimization, and walk away with insights to apply similar techniques in your own production environments.
00:00:00 - Introduction to the Voyage AI Platform
00:00:26 - Key Differences: Embeddings vs. Rerankers
00:01:43 - Solving the Latency vs. Throughput Tension
00:04:19 - Dynamic Query Batching for GPU Efficiency
00:07:04 - Request Unbatching & Parallel Execution
00:09:55 - Autoscaling for Bursty Traffic Patterns
00:11:14 - Building Warm GPU Pools for Faster Scaling
00:14:05 - Solving the "Cold Start" Problem
00:15:31 - Multi-Tier Model Weight Caching
00:17:53 - GPU Performance: Sequence Packing & Padding
00:18:59 - Kernel Fusion & Roofline Analysis
00:20:23 - Reducing Kernel Launch & Python Overhead
Visit Mongodb.com → https://mdb.link/MongoDB
Read the MongoDB Blog → https://mdb.link/Blog
Read the Developer Blog → https://mdb.link/developerblog
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: