This is what happens when you let AIs debate

Автор: Machine Learning Street Talk

Загружено: 2024-09-27

Просмотров: 11105

Описание: Akbir Khan, AI researcher and ICML best paper winner, discusses his work on AI alignment, debate techniques for truthful AI responses, and the future of artificial intelligence.

Key points discussed:
Using debate between language models to improve truthfulness in AI responses
Scalable oversight for supervising AI models beyond human-level intelligence
The relationship between intelligence and agency in AI systems
Challenges in AI safety and alignment
The potential for a Cambrian explosion in human-like intelligent systems

The discussion also explored broader topics:
The wisdom of crowds vs. expert knowledge in machine learning debates
Deceptive alignment and reward tampering in AI systems
Open-ended AI systems and their implications for development and safety
The space of possible minds and defining superintelligence
Cultural evolution and memetics in understanding intelligence

Akbir Khan:
https://x.com/akbirkhan
https://akbir.dev/

Show notes and transcript: https://www.dropbox.com/scl/fi/sjekiv...

TOC (*) are best bits
00:00:00 1. Intro: AI alignment and debate techniques for truthful responses *
00:05:00 2. Scalable oversight and hidden information settings
00:10:05 3. AI agency, intelligence, and progress *
00:15:00 4. Base models, RL training, and instrumental goals
00:25:11 5. Deceptive alignment and RL challenges in AI *
00:30:12 6. Open-ended AI systems and future directions
00:35:34 7. Deception, superintelligence, and the space of possible minds *
00:40:00 8. Cultural evolution, memetics, and intelligence measurement

References:
1. [00:00:40] Akbir Khan et al. ICML 2024 Best Paper: "Debating with More Persuasive LLMs Leads to More Truthful Answers"
https://arxiv.org/html/2402.06782v3

2. [00:03:28] Yann LeCun on machine learning debates
• Yann LeCun - A Path Towards Autonomous Mac...

3. [00:06:05] OpenAI's Superalignment team
https://openai.com/index/introducing-...

4. [00:08:10] Sam Bowman on scalable oversight in AI systems
https://arxiv.org/abs/2211.03540

5. [00:10:35] Sam Bowman on the sandwich protocol
https://www.alignmentforum.org/posts/...

6. [00:14:35] Janus' article on "Simulators" and LLMs
https://www.lesswrong.com/posts/vJFdj...

7. [00:16:35] Thomas Suddendorf's book "The Gap: The Science of What Separates Us from Other Animals"
https://www.amazon.in/GAP-Science-Sep...

8. [00:19:10] DeepMind on responsible AI
https://deepmind.google/about/respons...

9. [00:20:50] Technological singularity
https://en.wikipedia.org/wiki/Technol...

10. [00:21:30] Eliezer Yudkowsky on FOOM (Fast takeoff)
https://intelligence.org/files/AIFoom...

11. [00:21:45] Sammy Martin on recursive self-improvement in AI
https://www.alignmentforum.org/posts/...

12. [00:24:25] LessWrong community
https://www.lesswrong.com/

13. [00:24:35] Nora Belrose on AI alignment and deception
https://www.lesswrong.com/posts/YsFZF...

14. [00:25:35] Evan Hubinger on deceptive alignment in AI systems
https://www.lesswrong.com/posts/zthDP...

15. [00:26:50] Anthropic's article on reward tampering in language models
https://www.anthropic.com/research/re...

16. [00:32:35] Kenneth Stanley's work on open-endedness in AI
https://www.amazon.co.uk/Why-Greatnes...

17. [00:34:58] Ryan Greenblatt, Buck Shlegeris et al. on AI safety protocols
https://arxiv.org/pdf/2312.06942

18. [00:37:20] Aaron Sloman's concept of 'the space of possible minds'
https://www.cs.bham.ac.uk/research/pr...

19. [00:38:25] François Chollet on defining and measuring intelligence in AI
https://arxiv.org/abs/1911.01547

20. [00:42:30] Richard Dawkins on memetics
https://www.amazon.co.uk/Selfish-Gene...

21. [00:42:45] Jonathan Cook et al. on Artificial Generational Intelligence
https://arxiv.org/abs/2406.00392

22. [00:45:00] Peng on determinants of cryptocurrency pricing
https://www.emerald.com/insight/conte...

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

This is what happens when you let AIs debate

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

The ChatGPT Paradox: Impressive Yet Incomplete

The ChatGPT Paradox: Impressive Yet Incomplete

Tensor Logic "Unifies" AI Paradigms [Pedro Domingos]

ИИ - ЭТО ИЛЛЮЗИЯ ИНТЕЛЛЕКТА. Но что он такое и почему совершил революцию?

ИИ - ЭТО ИЛЛЮЗИЯ ИНТЕЛЛЕКТА. Но что он такое и почему совершил революцию?

AI's first kills show we're close to disaster. Godfather of AI

AI's first kills show we're close to disaster. Godfather of AI

AGI in 5 Years? Ben Goertzel on Superintelligence

AGI in 5 Years? Ben Goertzel on Superintelligence

Может ли у ИИ появиться сознание? — Семихатов, Анохин

Может ли у ИИ появиться сознание? — Семихатов, Анохин

The Economics of Transformative AI by Anton Korinek

The Economics of Transformative AI by Anton Korinek

Как LLM могут хранить факты | Глава 7, Глубокое обучение

Как LLM могут хранить факты | Глава 7, Глубокое обучение

Michael Levin - Why Intelligence Isn't Limited To Brains.

Michael Levin - Why Intelligence Isn't Limited To Brains.

ChatGPT продает ваши чаты, Anthropic создает цифровых существ, а Маск как всегда…

ChatGPT продает ваши чаты, Anthropic создает цифровых существ, а Маск как всегда…

Момент, когда мы перестали понимать ИИ [AlexNet]

Момент, когда мы перестали понимать ИИ [AlexNet]

Лучший документальный фильм про создание ИИ

Лучший документальный фильм про создание ИИ

GPT 5.3 - this is it…

GPT 5.3 - this is it…

Если вы не можете заглянуть внутрь, как вы узнаете, что оно МЫСЛИТ? [Доктор Джефф Бек]

Если вы не можете заглянуть внутрь, как вы узнаете, что оно МЫСЛИТ? [Доктор Джефф Бек]

The Dark Matter of AI [Mechanistic Interpretability]

The Dark Matter of AI [Mechanistic Interpretability]

Мы стоим на пороге нового конфликта! Что нас ждет дальше? Андрей Безруков про США, Россию и кризис

Мы стоим на пороге нового конфликта! Что нас ждет дальше? Андрей Безруков про США, Россию и кризис

ChatGPT in a kids robot does exactly what experts warned.

ChatGPT in a kids robot does exactly what experts warned.

Unreasonably effective AI | Demis Hassabis

Unreasonably effective AI | Demis Hassabis

Вселенная состоит из информации? Объясняю на пальцах

Вселенная состоит из информации? Объясняю на пальцах

AI can't cross this line and we don't know why.

AI can't cross this line and we don't know why.