AI safety Thursdays: Chain-of-Thought Monitoring and AI Control

Автор: Trajectory Labs

Загружено: 2025-10-30

Просмотров: 109

Описание: Modern reasoning models do a lot of thinking in natural language before producing their outputs. Can we catch misbehaviours by our LLMs and interpret their motivations simply by reading these chains of thought?

In this talk, Rauno Arike and Rohan Subramani will give an overview of research areas in chain-of-thought monitorability and AI control, and discuss their recent research on the usefulness of chain-of-thought monitoring for ensuring that LLM agents only pursue objectives that their developers intended them to follow.

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

AI safety Thursdays: Chain-of-Thought Monitoring and AI Control

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Управление поведением LLM без тонкой настройки

Управление поведением LLM без тонкой настройки

Но что такое нейронная сеть? | Глава 1. Глубокое обучение

Но что такое нейронная сеть? | Глава 1. Глубокое обучение

She wrote and submitted a research paper to a Q1 journal in one week

She wrote and submitted a research paper to a Q1 journal in one week

Как 27M Model вообще смогла обойти ChatGPT?

Как 27M Model вообще смогла обойти ChatGPT?

The Limits of AI: Generative AI, NLP, AGI, & What’s Next?

The Limits of AI: Generative AI, NLP, AGI, & What’s Next?

Краткое объяснение больших языковых моделей

Краткое объяснение больших языковых моделей

The Limitations of RL for LLMs in Achieving AI for Science

The Limitations of RL for LLMs in Achieving AI for Science

Большинство разработчиков не понимают, как работают контекстные окна.

Большинство разработчиков не понимают, как работают контекстные окна.

Reasoning without Language - Deep Dive into 27 mil parameter Hierarchical Reasoning Model

Reasoning without Language - Deep Dive into 27 mil parameter Hierarchical Reasoning Model

Как ИИ научился думать

Как ИИ научился думать

ЗАЧЕМ ТРАМПУ ГРЕНЛАНДИЯ? / Уроки истории @MINAEVLIVE

ЗАЧЕМ ТРАМПУ ГРЕНЛАНДИЯ? / Уроки истории @MINAEVLIVE

AI, Machine Learning, Deep Learning and Generative AI Explained

AI, Machine Learning, Deep Learning and Generative AI Explained

Что такое стек ИИ? Магистратура LLM, RAG и аппаратное обеспечение ИИ

Что такое стек ИИ? Магистратура LLM, RAG и аппаратное обеспечение ИИ

20 концепций искусственного интеллекта, объясненных за 40 минут

20 концепций искусственного интеллекта, объясненных за 40 минут

Introduction to Corrigibility

Introduction to Corrigibility

7 AI Terms You Need to Know: Agents, RAG, ASI & More

7 AI Terms You Need to Know: Agents, RAG, ASI & More

Deep Dive into LLMs like ChatGPT

Deep Dive into LLMs like ChatGPT

LLM fine-tuning или ОБУЧЕНИЕ малой модели? Мы проверили!

LLM fine-tuning или ОБУЧЕНИЕ малой модели? Мы проверили!

Will AI outsmart human intelligence? - with 'Godfather of AI' Geoffrey Hinton

Will AI outsmart human intelligence? - with 'Godfather of AI' Geoffrey Hinton

The Weirdly Small AI That Cracks Reasoning Puzzles [HRM]

The Weirdly Small AI That Cracks Reasoning Puzzles [HRM]