Grokking, Generalization Collapse, and Dynamics of Training Deep Neural Nets [Charles Martin] - 734

Автор: The TWIML AI Podcast with Sam Charrington

Загружено: 2025-06-04

Просмотров: 1195

Описание: Today, we're joined by Charles Martin, founder of Calculation Consulting, to discuss Weight Watcher, an open-source tool for analyzing and improving Deep Neural Networks (DNNs) based on principles from theoretical physics. We explore the foundations of the Heavy-Tailed Self-Regularization (HTSR) theory that underpins it, which combines random matrix theory and renormalization group ideas to uncover deep insights about model training dynamics. Charles walks us through WeightWatcher’s ability to detect three distinct learning phases—underfitting, grokking, and generalization collapse—and how its signature “layer quality” metric reveals whether individual layers are underfit, overfit, or optimally tuned. Additionally, we dig into the complexities involved in fine-tuning models, the surprising correlation between model optimality and hallucination, the often-underestimated challenges of search relevance, and their implications for RAG. Finally, Charles shares his insights into real-world applications of generative AI and his lessons learned from working in the field.

🗒️ For the full list of resources for this episode, visit the show notes page: https://twimlai.com/go/734.

🔔 Subscribe to our channel for more great content just like this: https://youtube.com/twimlai?sub_confi...

🗣️ CONNECT WITH US!
===============================
Subscribe to the TWIML AI Podcast: https://twimlai.com/podcast/twimlai/
Follow us on Twitter: / twimlai
Follow us on LinkedIn: / twimlai
Join our Slack Community: https://twimlai.com/community/
Subscribe to our newsletter: https://twimlai.com/newsletter/
Want to get in touch? Send us a message: https://twimlai.com/contact/

📖 CHAPTERS
===============================
00:00 - Introduction
4:08 - WeightWatcher
4:50 - Applying quant techniques to AI
7:03 - Overfitting and underfitting in models
11:40 Challenges in fine-tuning
17:00 - Degrees of fine-tuning
22:15 - Spiking neural networks
27:57 - Grokking
29:30 - Generalization collapse
30:00 - HTSR theory
34:17 - Data-centric AI and layer-specific training
38:45 - Renormalization group
39:29 - Challenges in data access and compliance
42:45 - Benchmarking
47:50 - The correlation between hallucination and model optimality
54:14 - Application of theoretical physics to AI
1:00:58 - Renormalization group, HTSR, and critical exponents
1:08:53 - Evaluation of grokking paper
1:13:25 - Real-world applications and lessons learned

🔗 LINKS & RESOURCES
===============================
Calculation Consulting - https://calculationconsulting.com/
WeightWatcher - https://weightwatcher.ai/
Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning (HTSR paper) - https://jmlr.org/papers/v22/20-410.html

📸 Camera: https://amzn.to/3TQ3zsg
🎙️Microphone: https://amzn.to/3t5zXeV
🚦Lights: https://amzn.to/3TQlX49
🎛️ Audio Interface: https://amzn.to/3TVFAIq
🎚️ Stream Deck: https://amzn.to/3zzm7F5

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Grokking, Generalization Collapse, and Dynamics of Training Deep Neural Nets [Charles Martin] - 734

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Inside the “Neurons” of LLMs: Circuit Tracing Their Hidden Biology [Emmanuel Ameisen] - 727

Inside the “Neurons” of LLMs: Circuit Tracing Their Hidden Biology [Emmanuel Ameisen] - 727

Zero-Shot Auto-Labeling: The End of Annotation for Computer Vision [Jason Corso] - 735

Zero-Shot Auto-Labeling: The End of Annotation for Computer Vision [Jason Corso] - 735

The Humanoid Takeover: $50T Market, Figure's Full Body Autonomy, and Robots in Dorms #229

The Humanoid Takeover: $50T Market, Figure's Full Body Autonomy, and Robots in Dorms #229

Autoformalization and Verifiable Superintelligence [Christian Szegedy] - 745

Autoformalization and Verifiable Superintelligence [Christian Szegedy] - 745

Something big is happening...

Something big is happening...

Grokking: Обобщение за пределами переобучения на небольших алгоритмических наборах данных (с пояс...

Grokking: Обобщение за пределами переобучения на небольших алгоритмических наборах данных (с пояс...

Борис Штерн. "Похоже, я знаю, как родилась Вселенная". Андрей Линде - наш гость. Вопросы-Ответы

Может ли у ИИ появиться сознание? — Семихатов, Анохин

Может ли у ИИ появиться сознание? — Семихатов, Анохин

Ты НЕ СМОЖЕШЬ изменить судьбу – и вот почему | Роберт Сапольски

Ты НЕ СМОЖЕШЬ изменить судьбу – и вот почему | Роберт Сапольски

Как и зачем охлаждают атомы — Семихатов, Вишнякова

Как и зачем охлаждают атомы — Семихатов, Вишнякова

КВАНТОВАЯ МЕХАНИКА: 100 лет открытий за 1 час / физик Семихатов

КВАНТОВАЯ МЕХАНИКА: 100 лет открытий за 1 час / физик Семихатов

Мы стоим на пороге нового конфликта! Что нас ждет дальше? Андрей Безруков про США, Россию и кризис

Мы стоим на пороге нового конфликта! Что нас ждет дальше? Андрей Безруков про США, Россию и кризис

Алексей Семихатов – Кто управляет Вселенной: числа, квантовые поля или нечто?

Алексей Семихатов – Кто управляет Вселенной: числа, квантовые поля или нечто?

Магия мозга. Александр Каплан

Магия мозга. Александр Каплан

Дмитрий Вибе — Как звездная смерть стала началом сознания?

Дмитрий Вибе — Как звездная смерть стала началом сознания?

Multimodal AI Models on Apple Silicon with MLX [Prince Canuma] - 744

Multimodal AI Models on Apple Silicon with MLX [Prince Canuma] - 744

Лучший документальный фильм про создание ИИ

Лучший документальный фильм про создание ИИ

Вселенная вопросов и ответов: Сергей Попов в «Яблоке»

Вселенная вопросов и ответов: Сергей Попов в «Яблоке»

Что такое квантовая теория

Что такое квантовая теория

Почему скорость света слишком медленная, чтобы добраться до других галактик | Документальный фильм

Почему скорость света слишком медленная, чтобы добраться до других галактик | Документальный фильм