Human Rights Benchmark for LLMs: Research Outcomes | Savannah Thais

Автор: Women at The Table

Загружено: 2025-12-08

Просмотров: 37

Описание: We are advancing the Human Rights Benchmark for Large Language Models (LLMs)—a research initiative that examines how these systems align with core human rights principles. AI models are making high-stakes decisions that directly impact human rights, but currently, no standard benchmark exists to evaluate their compliance. In this OpenStudio, Savannah Thais presents the outcomes of this work, sharing the research outcomes from the benchmarking research and what it reveal about the human rights implications of LLMs.

The Human Rights Benchmark Project is the first-of-its-kind, expert-annotated dataset designed to test Large Language Models (LLMs) like GPT, Claude, and Gemini on their understanding of international human rights law.

The presentation details the systematic IRAQ methodology (Issue, Rule Recall, Rule Application, Proposed Remedies), a modified legal reasoning framework based on real-world monitoring and reporting scenarios, and share the surprising preliminary results from the Right to Water benchmark. Findings show that leading models score around 50-60% accuracy, demonstrating a significant gap in their internalized knowledge of human rights obligations.

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Human Rights Benchmark for LLMs: Research Outcomes | Savannah Thais

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Human Rights Benchmark for LLMs: Process and Methodology | Savannah Thais

Human Rights Benchmark for LLMs: Process and Methodology | Savannah Thais

Measurement for Effective and Ethical AI | Savannah Thais

Measurement for Effective and Ethical AI | Savannah Thais

Medications, Part 2: Diagnostic Testing for Dementia and Stages of Alzheimer’s Disease

Medications, Part 2: Diagnostic Testing for Dementia and Stages of Alzheimer’s Disease

How Xiaomi Went From Dead Company to 50,000 Cars Sold in 27 Minutes!

How Xiaomi Went From Dead Company to 50,000 Cars Sold in 27 Minutes!

Большое интервью Екатерины Шульман: главное желание россиян, кислота войны и несчастные патриоты

Большое интервью Екатерины Шульман: главное желание россиян, кислота войны и несчастные патриоты

2/9/26 Lunch and Learn Creating Strong Present Levels of Progress using Vertical Progressions

2/9/26 Lunch and Learn Creating Strong Present Levels of Progress using Vertical Progressions

Exploring AI for Students with Disabilities

Exploring AI for Students with Disabilities

Chaos, szaleństwo, gospodarka na krawędzi. Kuczyński punktuje politykę Trumpa i jej skutki

Chaos, szaleństwo, gospodarka na krawędzi. Kuczyński punktuje politykę Trumpa i jej skutki

Te skecze przejdą do historii! - Kabaret Moralnego Niepokoju - Wielki Test o Historii i Skojarzenia

Te skecze przejdą do historii! - Kabaret Moralnego Niepokoju - Wielki Test o Historii i Skojarzenia

АЛЕКСАШЕНКО: "Ощущение плохое". Уточнение от Тинькова, Кремль, экономика, КРУГИ АДА, силовики

Chapter 3 Introduction to Probability

Chapter 3 Introduction to Probability

AI 101: An Introduction to AI and Use Cases for Congressional Offices

AI 101: An Introduction to AI and Use Cases for Congressional Offices

African AI & Equality Toolbox | Model Interpretation with sensors.africa

African AI & Equality Toolbox | Model Interpretation with sensors.africa

Co naukowcy robili w Aktach Epsteina? Mroczne sekrety świata nauki

Co naukowcy robili w Aktach Epsteina? Mroczne sekrety świata nauki

«Отдать Донбасс, и все закончится»: Ширяев о шансах на мир летом и итогах четырех лет боев

«Отдать Донбасс, и все закончится»: Ширяев о шансах на мир летом и итогах четырех лет боев

POŁKNĄ WŁASNE JĘZYKI?! Sędzia Pawełczyk o wielkiej mistyfikacji w KRS | Miłosz Kłeczek Zaprasza

POŁKNĄ WŁASNE JĘZYKI?! Sędzia Pawełczyk o wielkiej mistyfikacji w KRS | Miłosz Kłeczek Zaprasza

Talking Public Health with Rachel Garg

Talking Public Health with Rachel Garg

AI + Accessibility: Why Keeping Humans in the Loop Matters

AI + Accessibility: Why Keeping Humans in the Loop Matters

Application Guidance for Neurotechnology Clinical Pilot Call

Application Guidance for Neurotechnology Clinical Pilot Call

Арестович: В чем виноваты Залужный и Зеленский? Дневник войны

Арестович: В чем виноваты Залужный и Зеленский? Дневник войны