Human Rights Benchmark for LLMs: Research Outcomes | Savannah Thais
Автор: Women at The Table
Загружено: 2025-12-08
Просмотров: 37
Описание:
We are advancing the Human Rights Benchmark for Large Language Models (LLMs)—a research initiative that examines how these systems align with core human rights principles. AI models are making high-stakes decisions that directly impact human rights, but currently, no standard benchmark exists to evaluate their compliance. In this OpenStudio, Savannah Thais presents the outcomes of this work, sharing the research outcomes from the benchmarking research and what it reveal about the human rights implications of LLMs.
The Human Rights Benchmark Project is the first-of-its-kind, expert-annotated dataset designed to test Large Language Models (LLMs) like GPT, Claude, and Gemini on their understanding of international human rights law.
The presentation details the systematic IRAQ methodology (Issue, Rule Recall, Rule Application, Proposed Remedies), a modified legal reasoning framework based on real-world monitoring and reporting scenarios, and share the surprising preliminary results from the Right to Water benchmark. Findings show that leading models score around 50-60% accuracy, demonstrating a significant gap in their internalized knowledge of human rights obligations.
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: