AI Inference Pipelines – Building Low-Latency Systems With gRPC - Akshat Sharma, Deskree
Автор: CNCF [Cloud Native Computing Foundation]
Загружено: 2026-02-04
Просмотров: 112
Описание:
Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon events in Amsterdam, The Netherlands (23-26 March, 2026). Connect with our current graduated, incubating, and sandbox projects as the community gathers to further the education and advancement of cloud native computing. Learn more at https://kubecon.io
AI Inference Pipelines – Building Low-Latency Systems With gRPC - Akshat Sharma, Deskree
Ever tried running an AI model in production, only to see it slow down when every millisecond matters? From fraud detection to medical imaging, real-time AI systems can’t afford delays — and that’s where gRPC shines. In this session, I’ll share how we built AI inference pipelines using gRPC to handle low-latency, high-throughput communication across services. I’ll walk through the journey — what worked, what didn’t, and the lessons learned along the way. We’ll cover the architecture, the tricky performance bottlenecks, and how we scaled inference so it could keep up with real-world demand. By the end, you’ll leave with practical tips on designing fast, reliable, and production-ready AI systems powered by gRPC.
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: