The Real Bottleneck in AI. Weka’s Val Bercovici on Tokenomics, Memory, and the Future of Inference
Автор: IgniteGTM
Загружено: 2025-12-04
Просмотров: 37
Описание:
📍 Recorded live at AI INFRA SUMMIT 4, Convene San Francisco
AI is advancing fast, but the economics behind it are hitting a hard wall. In this fireside chat, Val Bercovici, Chief AI Officer at Weka, joins Keith Newman to break down the emerging discipline of tokenomics and the deeper system bottleneck driving today’s AI costs: GPU prefill and memory scarcity.
Val explains why so many AI experiments fail when they hit production scale, how developers are running into shocking token bills, and why KV cache pressure and prefill limits are becoming the defining constraints for inference. He also explores shifting GPU supply, energy scarcity, and what 2026 might hold for agents, reinforcement learning, and next generation architectures.
Highlights from the session:
Why tokenomics is becoming the deciding factor between AI success and failure
The role of memory in cost, performance, and the constraints behind prompt caching
GPU scarcity, energy limits, and why cloud bills are exploding for AI native apps
The fundamental bottleneck of GPU prefill and how the industry is responding
What enterprises need from AI providers and how Weka stays hardware agnostic
Predictions for 2026 as agents mature from supervised interns to trusted autonomous systems
📣 Super early bird available — sign up for the next AI INFRA SUMMIT → https://luma.com/aiinfra5
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: