Computer chess with model predictive control and reinforcement learning

Автор: Dimitri Bertsekas

Загружено: 2025-01-29

Просмотров: 1694

Описание: Paper and slides at
https://web.mit.edu/dimitrib/www/MPC_...
https://web.mit.edu/dimitrib/www/MPC-...
We apply model predictive control (MPC), rollout, and reinforcement learning (RL) methodologies to computer chess. We introduce a new architecture for move selection, within which available chess engines are used as components. One engine is used to provide position evaluations in an approximation in value space MPC/RL scheme, while a second engine is used as nominal opponent, to emulate or approximate the moves of the true opponent player.

We show that our architecture improves substantially the performance of the position evaluation engine. In other words our architecture provides an additional layer of intelligence, on top of the intelligence of the engines on which it is based. This is true for any engine, regardless of its strength: top engines such as Stockfish and Komodo Dragon (of varying strengths), as well as weaker engines.

Theoretically, our methodology relies on generic cost improvement properties and the superlinear convergence framework of Newton's method, which fundamentally underlies approximation in value space, and related MPC/RL and rollout/policy iteration schemes. A critical requirement of this framework is that the first lookahead step should be executed exactly. This fact has guided our architectural choices, and is apparently an important factor in improving the performance of even the best available chess engines.

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Computer chess with model predictive control and reinforcement learning

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Reinforcement Learning, Model Predictive Control, and the Newton Step for Solving Bellman's Equation

Reinforcement Learning, Model Predictive Control, and the Newton Step for Solving Bellman's Equation

Plenary lecture at IFAC Nonlinear MPC, 2024; Model Predictive Control and Reinforcement Learning

Plenary lecture at IFAC Nonlinear MPC, 2024; Model Predictive Control and Reinforcement Learning

Abstract Dynamic Programming, Reinforcement Learning, Newton's Method, and Gradient Optimization

Abstract Dynamic Programming, Reinforcement Learning, Newton's Method, and Gradient Optimization

Lecture 12, 2025; Training of cost functions, approximation in policy space, policy gradient methods

Lecture 12, 2025; Training of cost functions, approximation in policy space, policy gradient methods

Methods for Ab Initio Molecular Dynamics Simulations UsingHybrid DFT Functionals

Methods for Ab Initio Molecular Dynamics Simulations UsingHybrid DFT Functionals

MPC from Basics to Learning-based Design (1/2)

MPC from Basics to Learning-based Design (1/2)

Масштабирование LLM упёрлось в предел: исследование MIT

Масштабирование LLM упёрлось в предел: исследование MIT

PID vs. Other Control Methods: What's the Best Choice

PID vs. Other Control Methods: What's the Best Choice

Градиентный спуск, как обучаются нейросети | Глава 2, Глубинное обучение

Градиентный спуск, как обучаются нейросети | Глава 2, Глубинное обучение

New Directions in RL: TD(lambda), aggregation, seminorm projections, free-form sampling (from 2014)

New Directions in RL: TD(lambda), aggregation, seminorm projections, free-form sampling (from 2014)

Lecture 1, 2024, course overview: RL and DP, AlphaZero, discrete and continuous applications

Lecture 1, 2024, course overview: RL and DP, AlphaZero, discrete and continuous applications

Но что такое нейронная сеть? | Глава 1. Глубокое обучение

Но что такое нейронная сеть? | Глава 1. Глубокое обучение

Lecture 4, 2025, POMDP, Systems with Changing Parameters, Adaptive Control, Model Predictive Control

Lecture 4, 2025, POMDP, Systems with Changing Parameters, Adaptive Control, Model Predictive Control

Мир AI-агентов уже наступил. Что меняется прямо сейчас

Мир AI-агентов уже наступил. Что меняется прямо сейчас

Reinforcement Learning Applied to Feedback Control

Reinforcement Learning Applied to Feedback Control

Lecture 11, 2025; Adversarial Problems, Minimax Rollout, Use of MPC Methods, Computer Chess

Lecture 11, 2025; Adversarial Problems, Minimax Rollout, Use of MPC Methods, Computer Chess

Lec 01. Introduction to Deep Learning

Lec 01. Introduction to Deep Learning

Lecture 1, 2025, Course overview: RL and DP, AlphaZero, deterministic DP, examples, applications

Lecture 1, 2025, Course overview: RL and DP, AlphaZero, deterministic DP, examples, applications

Надоели файлы? Вот, пожалуйста, сокеты • C • Live coding

Надоели файлы? Вот, пожалуйста, сокеты • C • Live coding

Lec 26: Health Care Economics

Lec 26: Health Care Economics