Lessons from building evals in Healthcare at Scale with Clara Matos

Автор: Lisbon AI

Загружено: 2026-02-16

Просмотров: 25

Описание: AI is transforming healthcare delivery by enabling more personalized, effective and efficient care at scale. However, deploying these models in a highly regulated, safety-critical environment introduces unique challenges, especially when it comes to ensuring consistency, reliability, and alignment with clinical standards.

In this practical, example-driven talk, Clara shares how to evaluate Health AI products throughout their lifecycle: from development to deployment and continuous improvement in production.

00:15 Why evals matter in healthcare (safety, reliability, usefulness)
02:52 Why evals are crucial for deploying LLMs in production
03:19 Benefits: iteration, regression detection, model comparison, cost savings
04:11 Offline eval loop: dataset → evaluators → V-check → release
05:01 Building eval datasets: not just random samples (rare + hard + regressions + tool/RAG branches)
06:05 What an eval item looks like (context + transcript) + continuous dataset updates
06:36 Defining criteria: binary outputs (pass/fail) to reduce ambiguity
07:09 Human annotation for expert alignment (clinical/product reviewers)
08:02 Building evaluators: human vs code-based vs LLM-as-judge
08:47 Human pairwise comparisons + internal tooling for release decisions
09:15 Code-based evaluators: scalable checks (example: character limits)
09:48 LLM-as-judge: prompt design + train/dev/test split for alignment
11:21 Iterating on judge prompts: disagreement analysis → refine → validate on test set
12:41 Why manual V-check still matters (metrics can improve but quality can regress)
14:08 Online evals: continuous evaluation in production
14:29 Guardrails as real-time safety checks (code + LLM judge for critical cases)
15:42 A/B tests in LLM products: metrics, lift, stakeholder patience, statistical constraints
17:03 Manual audits: high-ROI pattern detection + root cause analysis
17:38 Observability: log inputs/metadata/outputs/RAG docs/tools/feedback/evals
18:16 Running evaluators on production traces + alerting on pass-rate drops
19:10 Closing: evals are continuous — systems must improve with every interaction

Thank you to all our partner to make this happen! A big thanks to our gold sponsors for believing in us:

Uphold:
Founded in 2013, Uphold is a digital wallet and trading platform that makes cryptocurrencies and other assets affordable and accessible to everyone.
With coverage of 300+ assets, Uphold allows users to move seamlessly between digital and traditional currencies, enabling borderless access to financial services you can’t get through your bank. Their Anything-to-Anything interface lets anyone fund, trade, or send money globally in just one tap. Check it here: https://uphold.com

Datalinks:
DataLinks recognizes the importance of nurturing the AI ecosystem. They bring ontologies and knowledge graphs to Lisbon AI, redefining data engineering and shaping the future of Agentic Workflows and Vertical Search.
Discover how scattered data can be unified and linked to power your agents and backends, all with a single click and some prompts: https://datalinks.com/

Follow Clara:
https://x.com/clarafrmatos

Follow ‪@sword_health‬:
https://x.com/SwordHealth

Follow us on X:
https://x.com/lisbonai_

Follow us on LinkedIn:
/ lisbon-ai

Opening music:
‪@NIN‬

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Lessons from building evals in Healthcare at Scale with Clara Matos

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Дарио Амодеи — «Мы близки к концу экспоненты»

Дарио Амодеи — «Мы близки к концу экспоненты»

Better engineering with SLMs with Rachel-Lee Nabors

Better engineering with SLMs with Rachel-Lee Nabors

Lecture 7 - S0-101 Presentations | Demo Day | Modern Robot Learning from Scratch

Lecture 7 - S0-101 Presentations | Demo Day | Modern Robot Learning from Scratch

Бывший рекрутер Google объясняет, почему «ложь» помогает получить работу.

Бывший рекрутер Google объясняет, почему «ложь» помогает получить работу.

Мощнейший удар по флоту и авиации РФ / Улицы столицы перекрыты

Мощнейший удар по флоту и авиации РФ / Улицы столицы перекрыты

Аналитик данных, BI или продуктовый? Кто реально востребован в 2026?

Аналитик данных, BI или продуктовый? Кто реально востребован в 2026?

Why The Ultra Rich Are Moving to Milan

Why The Ultra Rich Are Moving to Milan

Как PostgreSQL может сделать больно, когда не ожидаешь — Михаил Жилин

Как PostgreSQL может сделать больно, когда не ожидаешь — Михаил Жилин

Как работает Search Engine под капотом: ранжирование и релевантность

Как работает Search Engine под капотом: ранжирование и релевантность

Введение в MCP | Протокол MCP - 01

Введение в MCP | Протокол MCP - 01

Лекция от легенды ИИ в Стэнфорде

Лекция от легенды ИИ в Стэнфорде

🎙 Честное слово с Константином Гаазе

🎙 Честное слово с Константином Гаазе

Лучшее от Вивальди 🎻 15 самых популярных произведений 🎼 Исцеление, расслабление

Лучшее от Вивальди 🎻 15 самых популярных произведений 🎼 Исцеление, расслабление

Экспресс-курс RAG для начинающих

Экспресс-курс RAG для начинающих

Building Community-Driven Foundational Models with Dana Aubakirova

Building Community-Driven Foundational Models with Dana Aubakirova

Музыка для работы за компьютером | Фоновая музыка для концентрации и продуктивности

Музыка для работы за компьютером | Фоновая музыка для концентрации и продуктивности

1С: быстрый старт в ИИ за 15 минут

1С: быстрый старт в ИИ за 15 минут

Как ответить на вопросы про Kafka на интервью? Полный разбор

Как ответить на вопросы про Kafka на интервью? Полный разбор

Собеседование на роль ML-инженера | Карьера в Data Science

Собеседование на роль ML-инженера | Карьера в Data Science

Понимание GD&T