Stop Recomputing: Semantic Caching & Best Practices for AI Apps | Unlocked Conf - San Jose
Автор: Momento
Загружено: 2026-03-05
Просмотров: 10
Описание:
How do you scale agentic AI without letting latency and cost explode?
In this session, Chai Nuthalapati break down a real-world retail assistant example to show why sequential LLM and tool calls quickly stack up into multi-second response times. Instead of relying on traditional exact-match caching, they introduce semantic caching, using vector embeddings to match meaning, not just text.
With native vector search in Valkey and orchestration through LangGraph, similar user queries can be served directly from cache in milliseconds. The result: up to 70% cache hit rates, dramatically lower latency during traffic spikes like Black Friday, and cost savings of more than 60%.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
About Unlocked Conference
Unlocked brings together infrastructure and platform engineers to share practical lessons in scaling high-performance systems with Valkey and modern real-time data infrastructure. Learn more at https://www.unlockedconf.io/
Follow #UnlockedConf for updates on social
👉 Join the Valkey Community Slack: https://www.unlockedconf.io/
👉 Follow Valkey on LinkedIn: / valkey
👉 Follow Momento on LinkedIn: / gomomento
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: