d-Matrix - Ultra-low Latency Batched Inference for Gen AI

Автор: Neil C. Hughes

Загружено: 2026-03-07

Просмотров: 62

Описание: What happens when the real bottleneck in artificial intelligence is no longer training models, but actually running them at scale?

In this episode of Tech Talks Daily, I sit down with Satyam Srivastava from d-Matrix to explore a shift that is quietly reshaping the entire AI infrastructure landscape. While much of the early AI race focused on training ever larger models, the next phase of AI adoption is increasingly defined by inference. That is the moment when trained models are deployed and used to generate real-world results millions of times a day.

Satyam brings a unique perspective shaped by years of experience in signal processing, machine learning, and hardware architecture, including time spent at NVIDIA and Intel working on graphics, media technologies, and AI systems. Now at d-Matrix, he is helping design next-generation computing architectures focused on one of the biggest challenges facing the AI industry today: efficiently running large language models without overwhelming data centers with unsustainable power and infrastructure demands.

During our conversation, we explored why the industry underestimated the infrastructure implications of inference at scale. While training large models grabs headlines, the real operational pressure often comes later when those models must serve millions of queries in real time. That shift places enormous strain on memory bandwidth, energy consumption, and data movement inside modern data centers.

Satyam explains how d-Matrix identified this challenge years before generative AI exploded into the mainstream. Instead of focusing on training hardware like many AI startups at the time, the company concentrated on inference efficiency. That decision is becoming increasingly relevant as organizations begin to realize that simply adding more GPUs to data centers is not a sustainable long-term strategy.

We also discuss the growing power constraints surrounding AI infrastructure, and why efficiency-driven design may be the only realistic path forward. With electricity supply, cooling capacity, and semiconductor availability all becoming limiting factors, the industry is being forced to rethink how AI systems are architected. Custom silicon, purpose-built accelerators, and heterogeneous computing environments are now emerging as key pieces of the puzzle.

The conversation also touches on the geopolitical and economic importance of AI semiconductor leadership, and why the relationship between frontier AI labs, infrastructure providers, and chip designers is becoming increasingly strategic. As governments and companies compete to maintain technological leadership, the question of who controls the hardware powering AI may prove just as important as the models themselves.

Looking ahead, Satyam shares his perspective on how the role of engineers will evolve as AI infrastructure becomes more specialized and energy-aware. Foundational engineering skills remain essential, but the next generation of engineers will also need to think in terms of entire systems, combining software, hardware, and AI tools to build more efficient computing environments.

As AI continues to move from research labs into everyday products and services, are organizations prepared for the infrastructure shift that comes with an inference-driven future? And could efficiency, rather than raw computing power, become the defining metric of the next phase of the AI race?

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

d-Matrix - Ultra-low Latency Batched Inference for Gen AI

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Tak mieszka Polka w Seulu - mikromieszkanie w stolicy Korei Południowej

Tak mieszka Polka w Seulu - mikromieszkanie w stolicy Korei Południowej

Dzisiaj Informacje Telewizja Republika 11.03.2026 | TV Republika

Dzisiaj Informacje Telewizja Republika 11.03.2026 | TV Republika

Как компания Gensler проектирует центры обработки данных для более быстрого развития искусственно...

Как компания Gensler проектирует центры обработки данных для более быстрого развития искусственно...

How Scale Computing Is Powering The Next Wave Of Edge Infrastructure

How Scale Computing Is Powering The Next Wave Of Edge Infrastructure

Лекция от легенды ИИ в Стэнфорде

Лекция от легенды ИИ в Стэнфорде

Как Windows работает с ОЗУ или почему вам НЕ НУЖНЫ гигабайты памяти

Как Windows работает с ОЗУ или почему вам НЕ НУЖНЫ гигабайты памяти

Запускаем и изучаем OpenClaw — автономного AI-агента, который живёт 24/7 на твоём сервере

Запускаем и изучаем OpenClaw — автономного AI-агента, который живёт 24/7 на твоём сервере

How InfoScale Is Redefining Enterprise Resilience In A Multi-Cloud World

How InfoScale Is Redefining Enterprise Resilience In A Multi-Cloud World

How EY Sees Marketplaces Shaping The Future Of Enterprise AI

How EY Sees Marketplaces Shaping The Future Of Enterprise AI

Лучший документальный фильм про создание ИИ

Лучший документальный фильм про создание ИИ

Why Object First Says Most Immutable Backups Are Not Truly Immutable

Why Object First Says Most Immutable Backups Are Not Truly Immutable

Как AI меняет цикл разработки

Как AI меняет цикл разработки

Учащимся об информатике и компьютерах, 1988

Учащимся об информатике и компьютерах, 1988

Как Гений Математик разгадал тайну вселенной

Как Гений Математик разгадал тайну вселенной

Учёные в Давосе 2026: жесткий спор о безопасности и AGI

Учёные в Давосе 2026: жесткий спор о безопасности и AGI

ШУЛЬМАН: новая мобилизация, уход Путина, смута. Чебурнет. Большое интервью / МОЖЕМ ОБЪЯСНИТЬ

ШУЛЬМАН: новая мобилизация, уход Путина, смута. Чебурнет. Большое интервью / МОЖЕМ ОБЪЯСНИТЬ

Что такое ИИ-АГЕНТЫ и как они работают?

Что такое ИИ-АГЕНТЫ и как они работают?

Комплексные числа: коротко и понятно – Алексей Савватеев | Лекции по математике | Научпоп

Комплексные числа: коротко и понятно – Алексей Савватеев | Лекции по математике | Научпоп

Запуск нейросетей локально. Генерируем - ВСЁ

Запуск нейросетей локально. Генерируем - ВСЁ

ЦЕНА ОШИБКИ: 13 Инженерных Катастроф, Которые Потрясли Мир!

ЦЕНА ОШИБКИ: 13 Инженерных Катастроф, Которые Потрясли Мир!