Stop Recomputing: Semantic Caching & Best Practices for AI Apps | Unlocked Conf - San Jose

Автор: Momento

Загружено: 2026-03-05

Просмотров: 13

Описание: How do you scale agentic AI without letting latency and cost explode?
In this session, Chai Nuthalapati break down a real-world retail assistant example to show why sequential LLM and tool calls quickly stack up into multi-second response times. Instead of relying on traditional exact-match caching, they introduce semantic caching, using vector embeddings to match meaning, not just text.
With native vector search in Valkey and orchestration through LangGraph, similar user queries can be served directly from cache in milliseconds. The result: up to 70% cache hit rates, dramatically lower latency during traffic spikes like Black Friday, and cost savings of more than 60%.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
About Unlocked Conference
Unlocked brings together infrastructure and platform engineers to share practical lessons in scaling high-performance systems with Valkey and modern real-time data infrastructure. Learn more at https://www.unlockedconf.io/

Follow #UnlockedConf for updates on social
👉 Join the Valkey Community Slack: https://www.unlockedconf.io/
👉 Follow Valkey on LinkedIn: / valkey
👉 Follow Momento on LinkedIn: / gomomento

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Stop Recomputing: Semantic Caching & Best Practices for AI Apps | Unlocked Conf - San Jose

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Real-World Caching Lessons from 1B RPS in Prod | Unlocked Conf - San Jose

Real-World Caching Lessons from 1B RPS in Prod | Unlocked Conf - San Jose

Leveling Up Sharding: Making Resharding Boring With Atomic Slot Migration | Unlocked Conf - San Jose

Leveling Up Sharding: Making Resharding Boring With Atomic Slot Migration | Unlocked Conf - San Jose

Deep Dive into Amazon DynamoDB Global Tables with Somu Perianayagam

Deep Dive into Amazon DynamoDB Global Tables with Somu Perianayagam

The Systems Engineering Mandate: Scaling AI Beyond the Model | Unlocked Conf - San Jose

The Systems Engineering Mandate: Scaling AI Beyond the Model | Unlocked Conf - San Jose

Episode #11: The Cache Tooling Gap and Why Developers Are Filling It with Kristiyan Ivanov

Episode #11: The Cache Tooling Gap and Why Developers Are Filling It with Kristiyan Ivanov

Inside Stripe: Zero-Downtime Data Movement at Trillion-Dollar Scale

Inside Stripe: Zero-Downtime Data Movement at Trillion-Dollar Scale

Кодекс Клода против Кодекса: решение, которое усугубляется с каждой неделей промедления и о котор...

Кодекс Клода против Кодекса: решение, которое усугубляется с каждой неделей промедления и о котор...

Immutability in Motion | Unlocked Conf - San Jose

Immutability in Motion | Unlocked Conf - San Jose

Episode #10: Valkey, Rust, and Custom Data Structures with Dmitry Polyakovsky

Episode #10: Valkey, Rust, and Custom Data Structures with Dmitry Polyakovsky

How the Valkey Community Unlocked the Next Generation of Performance | Unlocked Conf - San Jose

How the Valkey Community Unlocked the Next Generation of Performance | Unlocked Conf - San Jose

Claude Code: полный гайд по AI-кодингу (хаки, техники и секреты)

Claude Code: полный гайд по AI-кодингу (хаки, техники и секреты)

Лучший Гайд по Kafka для Начинающих За 1 Час

Лучший Гайд по Kafka для Начинающих За 1 Час

Полный гайд по Claude: как выжать максимум из этой нейросети

Полный гайд по Claude: как выжать максимум из этой нейросети

Qwen 3.5 Plus УНИЧТОЖАЕТ платные AI! Бесплатно + уровень Claude Opus

Qwen 3.5 Plus УНИЧТОЖАЕТ платные AI! Бесплатно + уровень Claude Opus

GPT 5.4 ОЧЕНЬ Умен. Но умнее ли чем Opus 4.6? ВСЕ ИИ НОВОСТИ НЕДЕЛИ

GPT 5.4 ОЧЕНЬ Умен. Но умнее ли чем Opus 4.6? ВСЕ ИИ НОВОСТИ НЕДЕЛИ

Кто такой Сергей Брин? Гений, который сбежал от системы и подчинил себе весь интернет.

Кто такой Сергей Брин? Гений, который сбежал от системы и подчинил себе весь интернет.

#4 Глубокое понимание LLM: Архитектура трансформеров на пальцах | LLM: Прямой эфир через restream.su

#4 Глубокое понимание LLM: Архитектура трансформеров на пальцах | LLM: Прямой эфир через restream.su

ПЕРЕСТАНЬ ПЛАТИТЬ за Cursor AI. Используй эту БЕСПЛАТНУЮ и ЛОКАЛЬНУЮ альтернативу | VSCode+Roo Code

ПЕРЕСТАНЬ ПЛАТИТЬ за Cursor AI. Используй эту БЕСПЛАТНУЮ и ЛОКАЛЬНУЮ альтернативу | VSCode+Roo Code

Полный гайд Claude Code: С Нуля до SaaS | MCP, Sub-Агенты, Custom Commands

Полный гайд Claude Code: С Нуля до SaaS | MCP, Sub-Агенты, Custom Commands

Claude Cowork: Освой 95% функций за 19 минут

Claude Cowork: Освой 95% функций за 19 минут