Evaluating LLM-based chatbots: A framework for reliable AI assistants
Автор: Conversation Design Institute
Загружено: 2025-10-23
Просмотров: 2269
Описание:
Learn a practical framework to build test cases, choose metrics, set regression tests, and add guardrails to make LLM-powered chatbots reliable, safe, and less prone to hallucinations. This webinar also covers live monitoring strategies to make your chatbot reliable.
What you’ll learn:
How to build test cases that reveal weak points in LLM behavior
Choosing metrics that accurately reflect performance and reliability
Setting up regression tests to safely deploy chatbot updates
Adding guardrails to minimize hallucinations and harmful outputs
Live monitoring and log analysis strategies to continuously improve performance
Find a link to the LLM evaluation library here: https://parslabs.org/resources/llm-ev...
Meet the speakers:
@LenaShakurova is the founder of ParsLabs (https://parslabs.org), a Conversational AI agency, and Chatbotly (https://chatbotly.co), a no-code platform for building AI assistants trained on custom data.
At ParsLabs, she leads a team blending AI, user research and conversation science to design and develop high-quality AI Conversations that sound more human. She has a background in NLP and Artificial Intelligence and 7+ years of experience, and 100+ successful projects building production-ready chatbots and voice assistants.
Lena focuses on ethical, user-first AI, leveraging her expertise in Linguistics & AI to create responsible, high-quality AI solutions. She shares insights on AI innovation and human-centred design through her blog (https://shakurova.io/blog) and LinkedIn ( / lena-shakurova .
Willem Don is one of our seasoned Conversational AI Trainers, with eight years of extensive experience in language model development and evaluation. Throughout his career, he has successfully managed AI implementations for over 40 clients, demonstrating a profound understanding of dialogue system intricacies. As a contributor to the Conversation Design Institute's AI Trainer Course, he has been instrumental in shaping the next generation of AI training methodologies.
00:00 Intro
03:53 Why we shouldn’t launch without evals
06:07 3-stage LLM evals framework
08:45 Setting up experiments for LLM-based AI Assistants
10:39 Making a good test set
17:00 LLM eval metrics
19:01 LLM-as-a-judge
30:02 Specifics of evaluating LLM-based chatbots
31:35 RAG evals
36:00 Response quality evals
37:23 Conversation structure evals
42:09 Conversation simulations
49:30 Outro
Watch more webinars here: https://learn.conversationdesigninsti...
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: