LLM Chronicles #6.6: Hallucination Detection and Evaluation for RAG systems (RAGAS, Lynx)
Автор: Donato Capitella
Загружено: 2024-11-05
Просмотров: 25201
Описание:
This episode covers LLM hallucinations — why they happen, how to detect them, and ways to reduce them in RAG pipelines. We'll discuss key tools like RAGAS metrics for measuring faithfulness, context relevance, and answer relevance, along with techniques like using LLMs as judges and embedding models to catch hallucinations. Plus, we'll discuss the Lynx model, a fine-tuned version of Llama-3 built to identify and limit hallucinations, making responses more accurate.
Canvas Download: https://llm-chronicles.com/pdfs/llm-c...
🕤 Timestamps:
00:07 - Overview of Contents
00:46 - Hallucinations Root Cause
01:55 - RAG Pipelines
03:16 - Faithfulness / Groundedness
03:54 - RAGAS Metrics
05:33 - Tools (Embeddings, LLM-as-Judge)
06:45 - Evaluating Faithfulness with Embeddings
06:45 - Evaluating Faithfulness with LLM-as-Judge (Lynx)
07:55 - Evaluating Faithfulness with RAGAS
08:33 - Evaluating Answer Relevance
09:16 - Evaluating Context Relevance
10:34 - How to use these metrics?
11:55 - Summary
References:
WIRED: Air Canada Has to Honor a Refund Policy Its Chatbot Made Up
https://www.wired.com/story/air-canad...
RAGAS: Automated Evaluation of Retrieval Augmented Generation
https://arxiv.org/abs/2309.15217
Lynx: An Open Source Hallucination Evaluation Model
https://arxiv.org/abs/2407.08488
Alex Razvant: How to evaluate your RAG using RAGAs Framework
/ how-to-evaluate-your-rag-using-ragas-frame...
Leonie Monigatti: Evaluating RAG Applications with RAGAs
https://towardsdatascience.com/evalua...
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: