Using MuZero's Tree Search To Find Optimal Tic-Tac-Toe Strategy in a Spreadsheet
Автор: Concepts Illuminated
Загружено: 2022-06-11
Просмотров: 10865
Описание:
A video that explores how the MuZero algorithm combines aspects of Reinforcement Learning and Monte Carlo Tree Search to play efficiently. The animations in the video make use of a spreadsheet which acts as a worked example of the calculations involved in the algorithm. Formulas discussed include Upper Confidence Bounds (for Trees, also called UCB or UCT).
MuZero Tree Search Spreadsheet:
https://bit.ly/MuZeroSheetCopy (if you want to mess with the values yourself)
https://bit.ly/MuZeroSheetView (if you just want to inspect)
Playlist implementing other neural networks in a spreadsheet:
• Neural Networks in Spreadsheets
Reinforcement Learning explained by MIT:
• MIT 6.S191 (2022): Reinforcement Learning
MuZero Paper and psuedocode:
https://arxiv.org/pdf/1911.08265
https://arxiv.org/src/1911.08265v2/an...
MuZero talk from one of the authors, Julian Schrittwieser:
• MuZero - ICAPS 2020
Slides https://drive.google.com/file/d/1nwRR...
Monte Carlo Tree Search Visualization by Vinícius Garcia
• Monte Carlo Tree Search - Tic-Tac-Toe Visu...
https://vgarciasc.github.io/mcts-viz/
Other Articles on Reinforcement Learning
/ how-to-train-ai-agents-to-play-multiplayer...
https://dev.to/satwikkansal/a-gentle-...
Other Articles/Videos on Monte Carlo Tree Search (MCTS)
https://towardsdatascience.com/monte-...
http://jeffbradberry.com/posts/2015/0...
/ monte-carlo-tree-search-applied-to-letterp...
Animation by J O • Monte Carlo Tree Search animation - REVISITED
Reinforcement Learning and RTS games by Edan Meyer
• Reinforcement Learning in RTS Games
Image Credits:
/ 1
https://www.deepmind.com/research/hig...
Narration by James K. Script and visualizations by Kaylee L.
0:00 - Why is MuZero important?
1:05 - Outline/Overview
1:50 - MuZero's three neural networks
3:11 - Key Vocabulary Terms - state, reward, policy, action, value
4:50 - Assumptions for Tree Search Example
5:18 - What will Tree Search find? (The best action)
5:38 - Tree Search Setup
6:55 - Tree Search Begins: Selection, Expansion & Evaluation, Tree Update
11:37 - Upper Confidence Bound (UCB) Formula Explained
14:12 - First Winning Move Discovered!
15:03 - Tree Search Value Explained
18:45 - Using Tree Search Results to Form Policy and Select Action
20:30 - Collecting Training Data
22:16 - Unrolling Representation, Dynamics, and Prediction Networks for Training
24:26 - Playing Better with trained Neural Networks
26:13 - Recap
Special Thanks to
Robbie Close and Pat Berard for excellent feedback on initial drafts
P.S. I was wondering where the UCB constant 19652 came from. I emailed the primary author and got the following response:
"19652 was found using automatic hyperparameter optimization. The exact value is not critical i.e. 19000 or 20000 would work just as well."
I looked more closely Appendix C (Hyperparameters) and see they did this tuning for AlphaZero and just used the same UCB constants for MuZero.
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: