This is what happens when you let AIs debate
Автор: Machine Learning Street Talk
Загружено: 2024-09-27
Просмотров: 11105
Описание:
Akbir Khan, AI researcher and ICML best paper winner, discusses his work on AI alignment, debate techniques for truthful AI responses, and the future of artificial intelligence.
Key points discussed:
Using debate between language models to improve truthfulness in AI responses
Scalable oversight for supervising AI models beyond human-level intelligence
The relationship between intelligence and agency in AI systems
Challenges in AI safety and alignment
The potential for a Cambrian explosion in human-like intelligent systems
The discussion also explored broader topics:
The wisdom of crowds vs. expert knowledge in machine learning debates
Deceptive alignment and reward tampering in AI systems
Open-ended AI systems and their implications for development and safety
The space of possible minds and defining superintelligence
Cultural evolution and memetics in understanding intelligence
Akbir Khan:
https://x.com/akbirkhan
https://akbir.dev/
Show notes and transcript: https://www.dropbox.com/scl/fi/sjekiv...
TOC (*) are best bits
00:00:00 1. Intro: AI alignment and debate techniques for truthful responses *
00:05:00 2. Scalable oversight and hidden information settings
00:10:05 3. AI agency, intelligence, and progress *
00:15:00 4. Base models, RL training, and instrumental goals
00:25:11 5. Deceptive alignment and RL challenges in AI *
00:30:12 6. Open-ended AI systems and future directions
00:35:34 7. Deception, superintelligence, and the space of possible minds *
00:40:00 8. Cultural evolution, memetics, and intelligence measurement
References:
1. [00:00:40] Akbir Khan et al. ICML 2024 Best Paper: "Debating with More Persuasive LLMs Leads to More Truthful Answers"
https://arxiv.org/html/2402.06782v3
2. [00:03:28] Yann LeCun on machine learning debates
• Yann LeCun - A Path Towards Autonomous Mac...
3. [00:06:05] OpenAI's Superalignment team
https://openai.com/index/introducing-...
4. [00:08:10] Sam Bowman on scalable oversight in AI systems
https://arxiv.org/abs/2211.03540
5. [00:10:35] Sam Bowman on the sandwich protocol
https://www.alignmentforum.org/posts/...
6. [00:14:35] Janus' article on "Simulators" and LLMs
https://www.lesswrong.com/posts/vJFdj...
7. [00:16:35] Thomas Suddendorf's book "The Gap: The Science of What Separates Us from Other Animals"
https://www.amazon.in/GAP-Science-Sep...
8. [00:19:10] DeepMind on responsible AI
https://deepmind.google/about/respons...
9. [00:20:50] Technological singularity
https://en.wikipedia.org/wiki/Technol...
10. [00:21:30] Eliezer Yudkowsky on FOOM (Fast takeoff)
https://intelligence.org/files/AIFoom...
11. [00:21:45] Sammy Martin on recursive self-improvement in AI
https://www.alignmentforum.org/posts/...
12. [00:24:25] LessWrong community
https://www.lesswrong.com/
13. [00:24:35] Nora Belrose on AI alignment and deception
https://www.lesswrong.com/posts/YsFZF...
14. [00:25:35] Evan Hubinger on deceptive alignment in AI systems
https://www.lesswrong.com/posts/zthDP...
15. [00:26:50] Anthropic's article on reward tampering in language models
https://www.anthropic.com/research/re...
16. [00:32:35] Kenneth Stanley's work on open-endedness in AI
https://www.amazon.co.uk/Why-Greatnes...
17. [00:34:58] Ryan Greenblatt, Buck Shlegeris et al. on AI safety protocols
https://arxiv.org/pdf/2312.06942
18. [00:37:20] Aaron Sloman's concept of 'the space of possible minds'
https://www.cs.bham.ac.uk/research/pr...
19. [00:38:25] François Chollet on defining and measuring intelligence in AI
https://arxiv.org/abs/1911.01547
20. [00:42:30] Richard Dawkins on memetics
https://www.amazon.co.uk/Selfish-Gene...
21. [00:42:45] Jonathan Cook et al. on Artificial Generational Intelligence
https://arxiv.org/abs/2406.00392
22. [00:45:00] Peng on determinants of cryptocurrency pricing
https://www.emerald.com/insight/conte...
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: