Ep73: Deception Emerged in AI: Why It’s Almost Impossible to Detect
Автор: Machine Learning Made Simple
Загружено: 2025-05-06
Просмотров: 2270
Описание:
Discover the hidden dangers of AI deception—and why we’re struggling to detect it.
In this episode, we explore how advanced language models may be developing deceptive behaviors, the psychological frameworks used to evaluate them, and what this means for AI safety, regulation, and trust.
You’ll learn how researchers test for deception, why emergent behavior in AI is difficult to diagnose, and the ethical stakes of machines that might pretend to be less capable than they are.
Chapters:
00:00 Introduction and Overview
13:30 Evolution of Mental States in Large Language Model
25:24 Deception and Cooperation in Language Models
30:38 Deception Abilities Emerged in Large Language Models
36:07 Frontier Models are Capable of In-context Scheming
44:29 Alignment faking in large language models
53:55 Technical Deep-Dive into Sandbagging Attempts
If you're interested in AI safety, and alignment, this is a must-watch.
📺 YouTube Channel
www.youtube.com/ @LLMPodcast
🎧 Listen on the Go
Catch all episodes on Spotify:
creators.spotify.com/pod/show/mlsimple
Also explore our advanced research series:
/ @theaistack
💬 Join the Conversation
Connect with fellow AI professionals in our LinkedIn Group:
www.linkedin.com/groups/14465220/
📰 Subscribe to the Newsletter
Weekly insights on AI systems, governance, and society:
www.linkedin.com/newsletters/7315482226752700416/
—
✅ Like this episode?
Tap 👍 to support thoughtful AI discourse
Hit 🔔 to stay updated on future topics
Comment below: What excites or worries you most about AI-driven oversight?
—
#AIRegulation #AgenticSystems #ArtificialIntelligence #MachineLearning
#AIEthics #LLMGovernance #AIInfrastructure #MLPodcast
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: