LLM-as-a-Judge Evaluation for Dataset Experiments in Langfuse
Автор: Langfuse
Загружено: 2024-11-19
Просмотров: 3131
Описание:
🚀 Introducing LLM-as-a-judge Evaluation for Dataset Experiments in Langfuse
Learn how to reliably evaluate your LLM application changes using Langfuse's new managed LLM-as-a-judge evaluators. This feature helps teams:
• Automatically evaluate experiment runs against test datasets
• Compare metrics across different versions
• Identify regressions before they hit production
• Score outputs based on criteria like hallucination, helpfulness, relevance, and more
Works with popular LLM providers including OpenAI, Anthropic, Azure OpenAI, and AWS Bedrock through function calling.
🔗 Learn more at https://langfuse.com/changelog/2024-1...
Повторяем попытку...

Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: