Using MuZero's Tree Search To Find Optimal Tic-Tac-Toe Strategy in a Spreadsheet

Автор: Concepts Illuminated

Загружено: 2022-06-11

Просмотров: 10865

Описание: A video that explores how the MuZero algorithm combines aspects of Reinforcement Learning and Monte Carlo Tree Search to play efficiently. The animations in the video make use of a spreadsheet which acts as a worked example of the calculations involved in the algorithm. Formulas discussed include Upper Confidence Bounds (for Trees, also called UCB or UCT).

MuZero Tree Search Spreadsheet:
https://bit.ly/MuZeroSheetCopy (if you want to mess with the values yourself)
https://bit.ly/MuZeroSheetView (if you just want to inspect)

Playlist implementing other neural networks in a spreadsheet:
   • Neural Networks in Spreadsheets

Reinforcement Learning explained by MIT:
   • MIT 6.S191 (2022): Reinforcement Learning

MuZero Paper and psuedocode:
https://arxiv.org/pdf/1911.08265
https://arxiv.org/src/1911.08265v2/an...

MuZero talk from one of the authors, Julian Schrittwieser:
   • MuZero - ICAPS 2020
Slides https://drive.google.com/file/d/1nwRR...

Monte Carlo Tree Search Visualization by Vinícius Garcia
   • Monte Carlo Tree Search - Tic-Tac-Toe Visu...
https://vgarciasc.github.io/mcts-viz/

Other Articles on Reinforcement Learning
  / how-to-train-ai-agents-to-play-multiplayer...
https://dev.to/satwikkansal/a-gentle-...

Other Articles/Videos on Monte Carlo Tree Search (MCTS)
https://towardsdatascience.com/monte-...
http://jeffbradberry.com/posts/2015/0...
  / monte-carlo-tree-search-applied-to-letterp...
Animation by J O    • Monte Carlo Tree Search animation - REVISITED

Reinforcement Learning and RTS games by Edan Meyer
   • Reinforcement Learning in RTS Games

Image Credits:
  / 1
https://www.deepmind.com/research/hig...

Narration by James K. Script and visualizations by Kaylee L.

0:00 - Why is MuZero important?
1:05 - Outline/Overview
1:50 - MuZero's three neural networks
3:11 - Key Vocabulary Terms - state, reward, policy, action, value
4:50 - Assumptions for Tree Search Example
5:18 - What will Tree Search find? (The best action)
5:38 - Tree Search Setup
6:55 - Tree Search Begins: Selection, Expansion & Evaluation, Tree Update
11:37 - Upper Confidence Bound (UCB) Formula Explained
14:12 - First Winning Move Discovered!
15:03 - Tree Search Value Explained
18:45 - Using Tree Search Results to Form Policy and Select Action
20:30 - Collecting Training Data
22:16 - Unrolling Representation, Dynamics, and Prediction Networks for Training
24:26 - Playing Better with trained Neural Networks
26:13 - Recap

Special Thanks to
Robbie Close and Pat Berard for excellent feedback on initial drafts

P.S. I was wondering where the UCB constant 19652 came from. I emailed the primary author and got the following response:
"19652 was found using automatic hyperparameter optimization. The exact value is not critical i.e. 19000 or 20000 would work just as well."
I looked more closely Appendix C (Hyperparameters) and see they did this tuning for AlphaZero and just used the same UCB constants for MuZero.

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Using MuZero's Tree Search To Find Optimal Tic-Tac-Toe Strategy in a Spreadsheet

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Training a Deep Neural Network in a Spreadsheet

Training a Deep Neural Network in a Spreadsheet

AlphaGo to MuZero. Победа компьютера над человеком в интеллектуальных играх.

AlphaGo to MuZero. Победа компьютера над человеком в интеллектуальных играх.

Оптимизация проксимальной политики (PPO) — как обучать большие языковые модели

Оптимизация проксимальной политики (PPO) — как обучать большие языковые модели

AlphaZero Connect Four (Monte Carlo Tree Search)

AlphaZero Connect Four (Monte Carlo Tree Search)

The Verhoeff-Gumm Check Digit Algorithm #SoME3

The Verhoeff-Gumm Check Digit Algorithm #SoME3

Building a ML Transformer in a Spreadsheet

Building a ML Transformer in a Spreadsheet

Generative Model That Won 2024 Nobel Prize

Generative Model That Won 2024 Nobel Prize

Training a Neural Network in a Spreadsheet

Training a Neural Network in a Spreadsheet

Проблема нержавеющей стали

Проблема нержавеющей стали

Improving LLM accuracy with Monte Carlo Tree Search

Improving LLM accuracy with Monte Carlo Tree Search

MuZero - ICAPS 2020

MuZero - ICAPS 2020

Gradient Descent vs Evolution | How Neural Networks Learn

Gradient Descent vs Evolution | How Neural Networks Learn

MuZero: Освоение Atari, го, шахмат и сёги путём планирования с использованием изученной модели

MuZero: Освоение Atari, го, шахмат и сёги путём планирования с использованием изученной модели

Combining Deep Reinforcement Learning and Search for Imperfect-Information Games

Combining Deep Reinforcement Learning and Search for Imperfect-Information Games

An Image Detecting Spreadsheet: Implementing Convolutional Neural Networks From Scratch Part 1

An Image Detecting Spreadsheet: Implementing Convolutional Neural Networks From Scratch Part 1

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

Почему диффузия работает лучше, чем авторегрессия?

Почему диффузия работает лучше, чем авторегрессия?

Building a Recurrent Neural Network in a spreadsheet

Building a Recurrent Neural Network in a spreadsheet

Визуализация скрытого пространства: PCA, t-SNE, UMAP | Глубокое обучение с анимацией

Визуализация скрытого пространства: PCA, t-SNE, UMAP | Глубокое обучение с анимацией

От AlphaGo до MuZero — освоение Atari, го, шахмат и сёги путём планирования с использованием изуч...

От AlphaGo до MuZero — освоение Atari, го, шахмат и сёги путём планирования с использованием изуч...