ស្វែងយល់ពី Proximal Policy Optimization | PPO | Machine Learning Series | TFD Workshop
Автор: TFDevs
Загружено: 2026-02-14
Просмотров: 84
Описание:
វីដេអូដែលបាន Record នៃសិក្ខាសាលា Online អំពី "ស្វែងយល់ពី Proximal Policy Optimization" ជាផ្នែកនៃ Machine Learning Series
Recorded video of online workshop: "Understanding Proximal Policy Optimization" as part of Web Security Series
ចូលទាញយក Demo នឹង លំហាត់: https://github.com/tfd-ed/tfd-worksho...
TFD Workshop Repo: https://github.com/tfd-ed/tfd-workshop
🔑 អ្វីដែលរៀនបាន
Part 1: Reinforcement Learning Foundations
The RL framework: agents, environments, rewards, and policies
States, observations, and action spaces (discrete vs continuous)
The credit assignment problem and why RL is challenging
Real-world RL applications (games, robotics, control systems)
Part 2: Policy Gradient Methods
From value-based to policy-based methods
Understanding the policy gradient theorem
Why vanilla policy gradients are unstable
The importance of trust regions in learning
Part 3: Understanding PPO
The fundamental problem PPO solves
Clipping mechanism and surrogate objectives
Actor-Critic architecture
Generalized Advantage Estimation (GAE)
Part 4: Complete PPO Implementation
Actor and Critic neural networks in PyTorch
Memory buffer for experience collection
Computing advantages and returns
The PPO update loop with clipping
Part 5: Training the Lunar Lander
Environment setup with Gymnasium
Hyperparameter configuration
Training loop implementation
Monitoring and debugging training metrics
Visualizing learned behaviors
Live Demonstrations
Lunar Lander Environment - Understanding the observation space and actions
Untrained Agent Behavior - Random actions and crashes
PPO Training Process - Watching the agent learn in real-time
Trained Agent Performance - Successful landings and optimal behavior
Training Metrics Visualization - Interpreting reward curves and losses
Hands-On Lab Exercises
Exercise 1: Understanding the environment and action space
Exercise 2: Implementing the Actor-Critic networks
Exercise 3: Computing advantages with GAE
Exercise 4: The PPO update step
Exercise 5: Training your own agent
IG: / darachaukh
YouTube: / @tfdevs
Website: https://www.tfdevs.com/
Linkedin: / qiang-cun-zhi
TikTok: https://www.tiktok.com/@chaudarakh?_r...
Telegram Channel: https://t.me/tfdTech
Facebook Page:
/ chaudarascienceengineer
#MachineLearning #ReinforcementLearning #AI #PPO #Workshop #TechEducation #LearningByDoing #AIWorkshop #DeepLearning #PyTorch
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: