Hello Evals! Eval Engineering for AI Developers, lesson 1 - an intro to eval engineering
Автор: Galileo
Загружено: 2025-12-09
Просмотров: 1591
Описание:
Learn Eval Engineering in this free, 5-part, hands-on course presented by @jimbobbennett
90% of AI agents don't make it successfully to production. The biggest reason is the AI engineers building these apps don't have a clear way of evaluating that these agents are doing what they should do, and using the results of this evaluation to fix them.
In this course, you will learn all about evals for AI applications. You'll start with some out-of-the-box metrics and learn about evals, then move onto understanding observability for AI apps, analyzing failure states, defining custom metrics, then finally using these across your whole SDLC.
This will be hands on, so be prepared to write some code, create some metrics, and do some homework!
In this first lesson, you will:
Learn what evals are
- Learn how you can use simple evals to detect issues in an AI application
- Get hands on adding an eval to an app
Prerequisites:
A basic knowledge of Python
- Access to an OpenAI API key
A free Galileo account (we will be using Galileo as the evals platform). Sign up at https://galileo.ai/sign-up.
Course materials from https://github.com/rungalileo/eval-en...
Catch the rest of the lessons here: • Eval Engineering for AI Developers
0:00:10 - Introduction & Welcome
0:08:32 - Course Schedule & Overview
0:15:45 - The Trust Problem: Why AI Needs Testing
0:21:20 - Demo: A Simple HR Chatbot (And Why It Fails)
0:32:38 - Deterministic vs. Non-Deterministic Testing
0:34:16 - What is Eval Engineering?
0:44:40 - Building a Simple Metric (LLM as a Judge)
0:53:00 - Real World Example: Context Adherence
0:55:52 - Demo: Running Evals with Galileo
1:12:06 - Demo: Fixing the Application
1:15:32 - Summary & Homework
1:20:04 - Q&A Session
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: