The Real Bottleneck in AI. Weka’s Val Bercovici on Tokenomics, Memory, and the Future of Inference

Автор: IgniteGTM

Загружено: 2025-12-04

Просмотров: 37

Описание: 📍 Recorded live at AI INFRA SUMMIT 4, Convene San Francisco

AI is advancing fast, but the economics behind it are hitting a hard wall. In this fireside chat, Val Bercovici, Chief AI Officer at Weka, joins Keith Newman to break down the emerging discipline of tokenomics and the deeper system bottleneck driving today’s AI costs: GPU prefill and memory scarcity.

Val explains why so many AI experiments fail when they hit production scale, how developers are running into shocking token bills, and why KV cache pressure and prefill limits are becoming the defining constraints for inference. He also explores shifting GPU supply, energy scarcity, and what 2026 might hold for agents, reinforcement learning, and next generation architectures.

Highlights from the session:
Why tokenomics is becoming the deciding factor between AI success and failure
The role of memory in cost, performance, and the constraints behind prompt caching
GPU scarcity, energy limits, and why cloud bills are exploding for AI native apps
The fundamental bottleneck of GPU prefill and how the industry is responding
What enterprises need from AI providers and how Weka stays hardware agnostic
Predictions for 2026 as agents mature from supervised interns to trusted autonomous systems

📣 Super early bird available — sign up for the next AI INFRA SUMMIT → https://luma.com/aiinfra5

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

The Real Bottleneck in AI. Weka’s Val Bercovici on Tokenomics, Memory, and the Future of Inference

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео