SIGIR VF - RUBRIC: Evaluating Relevance for Information Retrieval and Generation (Laura Dietz)
Автор: SIGIR Virtual Forum
Загружено: 2025-04-25
Просмотров: 34
Описание:
This work won the Best Paper Award at SIGIR 24, ICTIR 23, and obtained the "best in tau" performance in the LLM-judge challenge of the LLM4Eval workshop.
Title: RUBRIC: Evaluating Relevance for Information Retrieval and Generation
Abstract: RAG systems are notoriously difficult to evaluate, because their responses are slightly different every time. This makes research findings non-reproducible and datasets non-reusable. We believe that LLMs can help Auto-Grading what is relevant vs not, but we also believe it is important to incorporate human judges into this process. With RUBRIC ``Relevance Understanding by Breaking Responses Into Components'' we define what is relevant for a query via a set of question-style nuggets or relevance criteria. After this, an LLM can automatically scan all passages that are retrieved and/or generated for whether these answer the nugget questions or criteria. The evaluation score of a system is the higher, the more nuggets are covered in the system's response. Not only does this process obtain best performance on a range of datasets but it also offers a straight-forward path to integration of human judges into designing nuggets and overseeing the automatic grading process.
Bio: Laura Dietz is a Professor of Computer Science at the University of New Hampshire where she leads the TREMA lab on Text Retrieval, Extraction, Machine Learning, and Analytics. She was previously part of the DWS group at University of Mannheim, the CIIR lab at the University of Massachusetts, and the Max-Planck Institute for Informatics in Germany.
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: