BIG Mistake in Adam | Adam vs AdamW
Автор: Build AI with Sandeep
Загружено: 2026-03-07
Просмотров: 32
Описание:
In this video we clearly explain the difference between Adam optimizer and AdamW optimizer used in deep learning and machine learning.
Many people use Adam without understanding how weight decay and L2 regularization behave inside adaptive optimizers. This video explains:
• Why momentum uses mean of gradients
• Why RMSProp uses squared gradients
• What weight decay actually means
• How L2 regularization changes the gradient
• Why Adam mixes weight decay incorrectly
• How AdamW fixes the problem with decoupled weight decay
This topic is important for anyone working in:
Deep Learning
Machine Learning
Neural Networks
Transformers
PyTorch / TensorFlow models
Most modern models like BERT, GPT, and Vision Transformers use AdamW, so understanding this optimizer is essential.
If you are preparing for ML interviews, research, or building deep learning models, this explanation will help you understand optimizers more clearly.
#AI #MachineLearning #Transformers #LLMs #DeepLearning #ArtificialIntelligence #GPT #BERT #OpenAI #BuildAIwithSandeep #optimizers #adamw
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: