How We Evaluate Large Language Models | Patrycja Cieplicka | LLMday Warsaw 2026 Q1
Автор: LLMday
Загружено: 2026-03-03
Просмотров: 10
Описание:
LLMday Warsaw 2026 Q1 - February 12
Grab your ticket for the next LLMday: https://www.llmday.com
Upcoming LLMday CFPs: https://cfp.ninja/?q=llmday&status=op...
Chapters
00:00 Welcome & Speaker Intro: Evaluating Large Language Models
00:11 Two Blocks Overview: What We Build for Clients
00:36 LLM Work in E‑commerce: Adaptation, Evaluation & Optimization
01:29 Four Ways to Measure LLM Performance (Metrics Landscape)
02:24 Pros/Cons of Each Evaluation Method
03:34 Using Open-Source Benchmarks the Right Way
04:34 Benchmark Pitfalls: Overfitting, Setup Differences & Comparability
06:25 Don’t Trust Tiny Gains: Statistical Significance Checks
07:18 Building Your Own Eval: Core Principles for Real-World Apps
09:26 Evaluation-Driven Development: Iterate Evals and Models Together
10:18 Tuning the Evaluator: Human-Labeled Test Sets & Validator Drift
13:43 LLM-as-a-Judge Methods: Scoring vs Pairwise Comparisons
14:34 Prompting Best Practices for LLM Judges (and Avoiding Bias)
19:15 Wrap-Up: Keep Evals Robust, Practical, and Business-Focused
20:06 Q&A: User Feedback in Eval Frameworks + E‑commerce Use Cases
22:25 Final Thanks & Closing
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: