Experimentation at Scale: Upwork’s VP of Engineering on Blast Radius, CPQI, and AI-Driven Ops
Автор: GrowthBook
Загружено: 2026-03-11
Просмотров: 23
Описание:
Summary
What do you test rigorously—and what do you ship fast and fix forward—when every change could impact millions?
Vinoj Kumar, Vice President of Engineering at Upwork, leads at the intersection of infrastructure and product, where feedback loops are longer and the blast radius is wider.
He shares a pragmatic framework for experimentation—blast radius x reversibility—that sets testing rigor, plus how he measures success in product terms: faster search, resilient marketplace trust, and developer velocity. Vinoj explains why “high engagement” can mask low-quality experiences, how his team instrumented an internal NL chatbot with turns-to-success and downstream signals (like fewer JIRA tickets), and how a composite metric—cost per quality inference (CPQI)—aligns finance, engineering, and data science by uniting cloud costs, performance, and model accuracy.
He details where AI is already paying off (build pipelines, incident detection, testing), how to monitor model drift post-launch, and why some wins on paper must be killed in production to protect trust—like a high-hit-rate caching project that surfaced stale profile data.
Expect concrete practices: shadow traffic, slow canaries, synthetic staging that mirrors reality, feature flags, LLMs-as-judges, and the mindset to tie infrastructure to business outcomes.
Timestamps
[00:45] – Guest intro: Infrastructure meets product—and why experimentation looks different
[01:36] – Deciding what to test: blast radius x reversibility; canaries, shadow traffic, ship-and-monitor
[03:09] – Defining “good”: internal dev metrics vs. marketplace outcomes—and when engagement lies
[06:23] – Case study: “Talk to Data” chatbot—thumbs, turns-to-success, and reduced JIRA tickets
[09:45] – CPQI: a composite metric for cost, performance, and model quality that breaks silos
[16:55] – AI in engineering: build-time gains, MTTR/MTTD, agentic testing, and drift monitoring
[24:06] – The caching miss: 92% hit rate, stale data, trust risks—and what to do instead
[29:12] – Career advice: balance stability with bold experiments; always link infra to business value
Takeaways
Decide testing rigor with blast radius x reversibility; reserve heavy testing for irreversible, high-impact systems.
Measure quality by efficiency and success ratio—not raw clicks or query counts.
Instrument NL tools with “turns to success” and track downstream impact (e.g., fewer ad hoc data tickets).
Build composite metrics (e.g., CPQI) to align finance, engineering, and data science around shared outcomes.
Use AI to accelerate builds, detect incidents sooner, and evaluate models; watch MTTR and MTTD.
Treat ML features as living systems: feature-flag rollouts, realistic staging, drift monitoring, and LLM-as-judge evaluations—and be willing to kill “wins” that erode trust.
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: