Lessons from building evals in Healthcare at Scale with Clara Matos
Автор: Lisbon AI
Загружено: 2026-02-16
Просмотров: 25
Описание:
AI is transforming healthcare delivery by enabling more personalized, effective and efficient care at scale. However, deploying these models in a highly regulated, safety-critical environment introduces unique challenges, especially when it comes to ensuring consistency, reliability, and alignment with clinical standards.
In this practical, example-driven talk, Clara shares how to evaluate Health AI products throughout their lifecycle: from development to deployment and continuous improvement in production.
00:15 Why evals matter in healthcare (safety, reliability, usefulness)
02:52 Why evals are crucial for deploying LLMs in production
03:19 Benefits: iteration, regression detection, model comparison, cost savings
04:11 Offline eval loop: dataset → evaluators → V-check → release
05:01 Building eval datasets: not just random samples (rare + hard + regressions + tool/RAG branches)
06:05 What an eval item looks like (context + transcript) + continuous dataset updates
06:36 Defining criteria: binary outputs (pass/fail) to reduce ambiguity
07:09 Human annotation for expert alignment (clinical/product reviewers)
08:02 Building evaluators: human vs code-based vs LLM-as-judge
08:47 Human pairwise comparisons + internal tooling for release decisions
09:15 Code-based evaluators: scalable checks (example: character limits)
09:48 LLM-as-judge: prompt design + train/dev/test split for alignment
11:21 Iterating on judge prompts: disagreement analysis → refine → validate on test set
12:41 Why manual V-check still matters (metrics can improve but quality can regress)
14:08 Online evals: continuous evaluation in production
14:29 Guardrails as real-time safety checks (code + LLM judge for critical cases)
15:42 A/B tests in LLM products: metrics, lift, stakeholder patience, statistical constraints
17:03 Manual audits: high-ROI pattern detection + root cause analysis
17:38 Observability: log inputs/metadata/outputs/RAG docs/tools/feedback/evals
18:16 Running evaluators on production traces + alerting on pass-rate drops
19:10 Closing: evals are continuous — systems must improve with every interaction
Thank you to all our partner to make this happen! A big thanks to our gold sponsors for believing in us:
Uphold:
Founded in 2013, Uphold is a digital wallet and trading platform that makes cryptocurrencies and other assets affordable and accessible to everyone.
With coverage of 300+ assets, Uphold allows users to move seamlessly between digital and traditional currencies, enabling borderless access to financial services you can’t get through your bank. Their Anything-to-Anything interface lets anyone fund, trade, or send money globally in just one tap. Check it here: https://uphold.com
Datalinks:
DataLinks recognizes the importance of nurturing the AI ecosystem. They bring ontologies and knowledge graphs to Lisbon AI, redefining data engineering and shaping the future of Agentic Workflows and Vertical Search.
Discover how scattered data can be unified and linked to power your agents and backends, all with a single click and some prompts: https://datalinks.com/
Follow Clara:
https://x.com/clarafrmatos
Follow @sword_health:
https://x.com/SwordHealth
Follow us on X:
https://x.com/lisbonai_
Follow us on LinkedIn:
/ lisbon-ai
Opening music:
@NIN
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: