Chain of Thought: Introducing Remote Labor Index (RLI)
Автор: Scale AI
Загружено: 2025-12-24
Просмотров: 1749048
Описание:
Introducing the Remote Labor Index, RLI. Brad Kenstler, Head of Agent Capabilities and Environments, discusses RLI with Bing Liu, Head of Research, Madhu Sehwag, Research Scientist, and Mantas Mazeika, Research Scientist at the Center for AI Safety.
The Remote Labor Index (RLI) is a benchmark that empirically measures the capability of AI agents to perform real-world, economically valuable remote work.
0:00 Introduction
1:00 Overview of RLI
5:11 Benchmarking Freelance work
10:32 Comparing RLI to other professional domain benchmarks
12:18 Deep dive on RLI tasks
17:27 Making tasks representative of real-world work
22:50 Rubrics vs judge-based evaluation
26:15 Bottlenecks on agentic capabilities
29:30 Which agents does RLI evaluate
34:04 Failure modes of RLI
37:30 Implications on the future of remote labor
42:10 Unlocking performance improvements on RLI
Learn more about the benchmark at: https://scale.com/leaderboard/rli
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: