How to find and measure AI's weak spots [Research, NAACL 2025 Outstanding Paper]

Автор: Jordan Boyd-Graber

Загружено: 2025-04-30

Просмотров: 532

Описание: Adversarial datasets should validate AI robustness by presenting samples that humans handle well but models struggle with. However, as models advance, these datasets risk becoming obsolete. Assessing whether a dataset remains adversarial is challenging due to the absence of a standardized metric for adversarialness. To address this, we introduce AdvScore, a human-grounded evaluation metric that quantifies a dataset's adversarial nature by accounting for the differing abilities of models and humans while also identifying low-quality examples.

Publication information:
Yoo Yeon Sung, Maharshi Gor, Eve Fleisig, Ishani Mondal, and Jordan Lee Boyd-Graber. ADVSCORE: A Metric for the Evaluation and Creation of Adversarial Benchmarks. North American Association for Computational Linguistics, 2025.

Read the full paper:
http://cs.umd.edu/~jbg//docs/2025_naa...

Take Part in Human-Computer QA Competitions:
http://qanta.org

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

How to find and measure AI's weak spots [Research, NAACL 2025 Outstanding Paper]

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Large Concept Models (LCMs) by Meta: The Era of AI After LLMs?

Large Concept Models (LCMs) by Meta: The Era of AI After LLMs?

How AI can detect offers that are

How AI can detect offers that are "too good to be true" [ACL Findings 2025, Research]

Machine Learning Projects to Get Hired in 2026 | The Ultimate Machine Learning Portfolio Guide 2026

Machine Learning Projects to Get Hired in 2026 | The Ultimate Machine Learning Portfolio Guide 2026

Stanford Webinar - Agentic AI: A Progression of Language Model Usage

Stanford Webinar - Agentic AI: A Progression of Language Model Usage

Вот лучшие инструменты ИИ для анализа исследовательских данных

Вот лучшие инструменты ИИ для анализа исследовательских данных

GraphRAG: союз графов знаний и RAG: Эмиль Эйфрем

GraphRAG: союз графов знаний и RAG: Эмиль Эйфрем

LLM fine-tuning или ОБУЧЕНИЕ малой модели? Мы проверили!

LLM fine-tuning или ОБУЧЕНИЕ малой модели? Мы проверили!

Самый быстрый способ сделать обзор литературы с помощью ИИ

Самый быстрый способ сделать обзор литературы с помощью ИИ

Почему «Трансформеры» заменяют CNN?

Почему «Трансформеры» заменяют CNN?

Как написать обзор литературы (без стресса!)

Как написать обзор литературы (без стресса!)

Что такое генеративный ИИ и как он работает? – Лекции Тьюринга с Миреллой Лапатой

Что такое генеративный ИИ и как он работает? – Лекции Тьюринга с Миреллой Лапатой

ИИ - ЭТО ИЛЛЮЗИЯ ИНТЕЛЛЕКТА. Но что он такое и почему совершил революцию?

ИИ - ЭТО ИЛЛЮЗИЯ ИНТЕЛЛЕКТА. Но что он такое и почему совершил революцию?

A Deep Dive on LLM Evaluation

A Deep Dive on LLM Evaluation

Here's the Best Math Resources you need for AI and ML.

Here's the Best Math Resources you need for AI and ML.

Can AI write Questions that it can't Answer? [Research, NAACL 2025]

Can AI write Questions that it can't Answer? [Research, NAACL 2025]

Управление поведением LLM без тонкой настройки

Управление поведением LLM без тонкой настройки

ЛУЧШАЯ БЕСПЛАТНАЯ НЕЙРОСЕТЬ Google, которой нет аналогов

ЛУЧШАЯ БЕСПЛАТНАЯ НЕЙРОСЕТЬ Google, которой нет аналогов

Why Do Magnets Work? Feynman’s Answer Will SHATTER Your Reality

Why Do Magnets Work? Feynman’s Answer Will SHATTER Your Reality

LLM vs NLP | Kevin Johnson

LLM vs NLP | Kevin Johnson

The bad habit from high school that's making AI dumb [ACL 2025, Research]

The bad habit from high school that's making AI dumb [ACL 2025, Research]