How to find and measure AI's weak spots [Research, NAACL 2025 Outstanding Paper]
Автор: Jordan Boyd-Graber
Загружено: 2025-04-30
Просмотров: 532
Описание:
Adversarial datasets should validate AI robustness by presenting samples that humans handle well but models struggle with. However, as models advance, these datasets risk becoming obsolete. Assessing whether a dataset remains adversarial is challenging due to the absence of a standardized metric for adversarialness. To address this, we introduce AdvScore, a human-grounded evaluation metric that quantifies a dataset's adversarial nature by accounting for the differing abilities of models and humans while also identifying low-quality examples.
Publication information:
Yoo Yeon Sung, Maharshi Gor, Eve Fleisig, Ishani Mondal, and Jordan Lee Boyd-Graber. ADVSCORE: A Metric for the Evaluation and Creation of Adversarial Benchmarks. North American Association for Computational Linguistics, 2025.
Read the full paper:
http://cs.umd.edu/~jbg//docs/2025_naa...
Take Part in Human-Computer QA Competitions:
http://qanta.org
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: