Hello Evals! Eval Engineering for AI Developers, lesson 1 - an intro to eval engineering

Автор: Galileo

Загружено: 2025-12-09

Просмотров: 1591

Описание: Learn Eval Engineering in this free, 5-part, hands-on course presented by ‪@jimbobbennett‬

90% of AI agents don't make it successfully to production. The biggest reason is the AI engineers building these apps don't have a clear way of evaluating that these agents are doing what they should do, and using the results of this evaluation to fix them.

In this course, you will learn all about evals for AI applications. You'll start with some out-of-the-box metrics and learn about evals, then move onto understanding observability for AI apps, analyzing failure states, defining custom metrics, then finally using these across your whole SDLC.

This will be hands on, so be prepared to write some code, create some metrics, and do some homework!

In this first lesson, you will:

Learn what evals are
- Learn how you can use simple evals to detect issues in an AI application
- Get hands on adding an eval to an app

Prerequisites:

A basic knowledge of Python
- Access to an OpenAI API key
A free Galileo account (we will be using Galileo as the evals platform). Sign up at https://galileo.ai/sign-up.
Course materials from https://github.com/rungalileo/eval-en...

Catch the rest of the lessons here: • Eval Engineering for AI Developers

0:00:10 - Introduction & Welcome
0:08:32 - Course Schedule & Overview
0:15:45 - The Trust Problem: Why AI Needs Testing
0:21:20 - Demo: A Simple HR Chatbot (And Why It Fails)
0:32:38 - Deterministic vs. Non-Deterministic Testing
0:34:16 - What is Eval Engineering?
0:44:40 - Building a Simple Metric (LLM as a Judge)
0:53:00 - Real World Example: Context Adherence
0:55:52 - Demo: Running Evals with Galileo
1:12:06 - Demo: Fixing the Application
1:15:32 - Summary & Homework
1:20:04 - Q&A Session

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Hello Evals! Eval Engineering for AI Developers, lesson 1 - an intro to eval engineering

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Observability in AI apps. Eval Engineering for AI Developers, lesson 2 - add observability to AI

Observability in AI apps. Eval Engineering for AI Developers, lesson 2 - add observability to AI

Cursor 2026 - лучшие практики разработки с агентами

Cursor 2026 - лучшие практики разработки с агентами

Navigating the Azure AI seas by Kim Berg [m365con.net]

Navigating the Azure AI seas by Kim Berg [m365con.net]

How the Top 15% Approach AI Evals: Insights from the State of Eval Engineering Report

How the Top 15% Approach AI Evals: Insights from the State of Eval Engineering Report

Failure analysis. Eval Engineering for AI Developers, lesson 3 - learn how to find AI agent failures

Failure analysis. Eval Engineering for AI Developers, lesson 3 - learn how to find AI agent failures

NotebookLM на максималках. Как изучать всё быстрее чем 99% пользователей

NotebookLM на максималках. Как изучать всё быстрее чем 99% пользователей

Генеративный ИИ в разработке ПО: Введение

Генеративный ИИ в разработке ПО: Введение

The Thinking Game | Full documentary | Tribeca Film Festival official selection

The Thinking Game | Full documentary | Tribeca Film Festival official selection

Robert Martin on Clojure, AI, Programming Languages and the Craft of Good Code

Robert Martin on Clojure, AI, Programming Languages and the Craft of Good Code

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

Ложь о десятикратном увеличении производительности ИИ, из-за которой разработчики начинают терять...

Ложь о десятикратном увеличении производительности ИИ, из-за которой разработчики начинают терять...

System Design Concepts Course and Interview Prep

System Design Concepts Course and Interview Prep

Рекурсивные языковые модели (РЛМ) — давайте создадим самых крутых агентов! (Теория и код)

Рекурсивные языковые модели (РЛМ) — давайте создадим самых крутых агентов! (Теория и код)

Почему спагетти-код лучше чистой архитектуры

Почему спагетти-код лучше чистой архитектуры

Don't learn AI Agents without Learning these Fundamentals

Don't learn AI Agents without Learning these Fundamentals

OpenClaw — есть ли реальная ценность? (нет)

OpenClaw — есть ли реальная ценность? (нет)

Evals in your SDLC. Eval Engineering for AI Developers , lesson 5 - learn how evals fit in your SDLC

Evals in your SDLC. Eval Engineering for AI Developers , lesson 5 - learn how evals fit in your SDLC

Cursor AI: Полный гайд по вайбкодингу с нуля. Subagents, Hooks, Skills, Rules, Commands, MCP

Cursor AI: Полный гайд по вайбкодингу с нуля. Subagents, Hooks, Skills, Rules, Commands, MCP

20 концепций искусственного интеллекта, объясненных за 40 минут

20 концепций искусственного интеллекта, объясненных за 40 минут

Кто пишет код лучше всех? Сравнил GPT‑5.2, Opus 4.5, Sonnet 4.5, Gemini 3, Qwen 3 Max, Kimi, GLM

Кто пишет код лучше всех? Сравнил GPT‑5.2, Opus 4.5, Sonnet 4.5, Gemini 3, Qwen 3 Max, Kimi, GLM