Stop Treating LLMs Like REST APIs - Jeff Fran & Jack Pearce - NDC London 2026
Автор: NDC Conferences
Загружено: 2026-03-09
Просмотров: 794
Описание:
This talk was recorded at NDC London in London, England. #ndclondon #ndcconferences #developer #softwaredeveloper
Attend the next NDC conference near you:
https://ndcconferences.com
https://ndclondon.com/
Subscribe to our YouTube channel and learn every day:
/ @NDC
Follow our Social Media!
/ ndcconferences
/ ndc_conferences
/ ndc_conferences
#ai #architecture #cloud
Why do PoCs run smoothly while launch day implodes?
Because LLM traffic is a streaming, state-heavy beast that breaks every REST assumption: requests aren’t stateless, payloads snowball with context, and GPU memory melts under token floods. We’ll map the three checkpoints where most projects stall—context explosion, batch backfires, cache chaos—and show how LLM-D’s open-source sharding plus a hybrid NVIDIA/AMD node pool turns each choke point into a green light. You’ll see live before-and-after dashboards, get a YAML ladder you can drop into any cluster, and learn a back-of-the-napkin formula to keep cost per 1 000 tokens under control.
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: