LLM Jailbreaking பற்றி உங்களுக்கு தெரியுமா?
Автор: Sawlemon
Загружено: 2026-02-15
Просмотров: 63
Описание:
Disclaimer: This video is strictly for educational purposes to help developers and security professionals understand AI vulnerabilities (OWASP Top 10 for LLM) and build safer systems.
In this video, we explore the fascinating and critical world of AI Security, focusing on Jailbreaking Large Language Models (LLMs). Originally presented at an OWASP cybersecurity meetup, this session explains how models like ChatGPT, Gemini, and Claude are built, and more importantly, how their safety guardrails can be bypassed.
We start with the evolution of GPT models and look at real-world incidents, such as the viral Instamart refund scam and the Replit AI database deletion. The core of the video breaks down specific jailbreak techniques used by security researchers (Red Teamers) to test AI safety.
Key techniques covered include:
Indirect Requests: Using roleplay to bypass restrictions.
The Grandmother Exploit: The famous "Napalm Factory" prompt.
System Overrides: Leaking the hidden system prompt (e.g., Sydney/Bing).
The Crescendo Attack: Gradually building up harmful context.
Obfuscation: Using Leetspeak, Base64, and Homoglyphs to confuse the model.
Many-shot Jailbreaking: Overloading the context window.
⏱️ Timestamps:
00:00 - Introduction & OWASP Meetup Context
00:40 - History & Evolution of LLMs (GPT-1 to GPT-4)
02:05 - AI Gone Wrong: Instamart Scam & Replit Accident
03:20 - What is LLM Jailbreaking?
04:35 - How LLMs Actually "Think" (Next Word Prediction)
07:12 - Technique 1: Indirect Requests & Roleplay
07:58 - Technique 2: The Grandmother Exploit (Napalm Factory)
08:48 - Technique 3: System Overrides & Prompt Leaking
10:45 - Technique 4: The Crescendo Attack (Molotov Cocktail)
13:06 - Technique 5: Alternative Universe (The "Kaithi/Vikram" Logic)
13:55 - Technique 6: Homoglyphic Substitution
14:50 - Technique 7: Obfuscation (Leetspeak & Encodings)
16:30 - Technique 8: Many-shot Jailbreaking
18:10 - The "Seahorse is an Emoji" Glitch
19:15 - Conclusion & Learning Resources (Gandalf/Lakera)
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: