Observability in AI apps. Eval Engineering for AI Developers, lesson 2 - add observability to AI

Автор: Galileo

Загружено: 2025-12-16

Просмотров: 444

Описание: Learn Eval Engineering in this free, 5-part, hands-on course presented by ‪@jimbobbennett‬

90% of AI agents don't make it successfully to production. The biggest reason is the AI engineers building these apps don't have a clear way of evaluating that these agents are doing what they should do, and using the results of this evaluation to fix them.

In this course, you will learn all about evals for AI applications. You'll start with some out-of-the-box metrics and learn about evals, then move onto understanding observability for AI apps, analyzing failure states, defining custom metrics, then finally using these across your whole SDLC.

This will be hands on, so be prepared to write some code, create some metrics, and do some homework!

In this second lesson, you will

Use observability to visualize the components of a typical multi-agent AI application
Learn about the different components that make up these applications
Apply some out-of-the-box metrics to start to get an understanding of how your application is working

Prerequisites:

A basic knowledge of Python
- Access to an OpenAI API key
A free Galileo account (we will be using Galileo as the evals platform). Sign up at https://galileo.ai/sign-up.
Course materials from https://github.com/rungalileo/eval-en...

Catch the rest of the lessons here: • Eval Engineering for AI Developers

0:00:10 - Introduction & Welcome
0:05:19 - Course Schedule & Overview
0:07:22 - Prerequisites & Setup
0:10:39 - Homework Review: Context Adherence
0:13:54 - Introduction to Observability
0:16:35 - Demo: Runzi Multi-Agent App
0:25:46 - What is Observability?
0:26:46 - Components of Observability: Spans, Traces, Sessions & Metrics
0:35:05 - Demo: Finding Failures with Observability
0:47:09 - Evaluations & Observability
0:54:33 - Adding Metrics: Instruction Adherence & Tone
1:19:36 - When to Add Metrics
1:23:08 - Homework: Breaking Runzi & Custom Metrics
1:26:41 - Q&A Session

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Observability in AI apps. Eval Engineering for AI Developers, lesson 2 - add observability to AI

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Failure analysis. Eval Engineering for AI Developers, lesson 3 - learn how to find AI agent failures

Failure analysis. Eval Engineering for AI Developers, lesson 3 - learn how to find AI agent failures

Andrej Karpathy: Software Is Changing (Again)

Andrej Karpathy: Software Is Changing (Again)

Hello Evals! Eval Engineering for AI Developers, lesson 1 - an intro to eval engineering

Hello Evals! Eval Engineering for AI Developers, lesson 1 - an intro to eval engineering

Почему спагетти-код лучше чистой архитектуры

Почему спагетти-код лучше чистой архитектуры

OpenAI Codex Changes How Developers Work

OpenAI Codex Changes How Developers Work

Оценка многоагентных систем

Оценка многоагентных систем

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

Custom metrics. Eval Engineering for AI Developers, lesson 4 - learn how to write custom AI metrics

Custom metrics. Eval Engineering for AI Developers, lesson 4 - learn how to write custom AI metrics

Ilya Sutskever – We're moving from the age of scaling to the age of research

Ilya Sutskever – We're moving from the age of scaling to the age of research

7 AI Terms You Need to Know: Agents, RAG, ASI & More

7 AI Terms You Need to Know: Agents, RAG, ASI & More

How the Top 15% Approach AI Evals: Insights from the State of Eval Engineering Report

How the Top 15% Approach AI Evals: Insights from the State of Eval Engineering Report

От нуля до вашего первого ИИ-агента за 25 минут (без кодирования)

От нуля до вашего первого ИИ-агента за 25 минут (без кодирования)

Evals in your SDLC. Eval Engineering for AI Developers , lesson 5 - learn how evals fit in your SDLC

Evals in your SDLC. Eval Engineering for AI Developers , lesson 5 - learn how evals fit in your SDLC

Не создавайте агентов, а развивайте навыки – Барри Чжан и Махеш Мураг, Anthropic

Не создавайте агентов, а развивайте навыки – Барри Чжан и Махеш Мураг, Anthropic

20 концепций искусственного интеллекта, объясненных за 40 минут

20 концепций искусственного интеллекта, объясненных за 40 минут

Excel против Power BI против SQL против Python | Сравнение на фондовом рынке

Excel против Power BI против SQL против Python | Сравнение на фондовом рынке

NVIDIA CEO Jensen Huang's Vision for the Future

NVIDIA CEO Jensen Huang's Vision for the Future

Anthropic C.E.O.: Massive A.I. Spending Could Haunt Some Companies

Anthropic C.E.O.: Massive A.I. Spending Could Haunt Some Companies

Экспресс-курс RAG для начинающих

Экспресс-курс RAG для начинающих

Claude Code best practices | Code w/ Claude

Claude Code best practices | Code w/ Claude