New Directions in RL: TD(lambda), aggregation, seminorm projections, free-form sampling (from 2014)

Автор: Dimitri Bertsekas

Загружено: 2025-03-01

Просмотров: 664

Описание: This lecture explores three interrelated research directions in approximate dynamic programming and reinforcement learning:
1. Seminorm projections (unifying projected equation and aggregation
approaches), generalized Bellman equations (multistep equations with state-dependent
weights; the TD(lambda) equation is an example), and free form sampling (a flexible alternative to single long trajectory simulation)
2 Aggregation and seminorm projected equations
3 Simulation-based implementation of iterative and matrix inversion methods using free-form sampling.
Part of this material has appeared in varying degrees of detail in my 2012 DP book (Vol. II), and my 2022 Abstract DP book. Slides at http://www.mit.edu/~dimitrib/Gen_Bell...

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

New Directions in RL: TD(lambda), aggregation, seminorm projections, free-form sampling (from 2014)

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Lecture 8, 2025; GPT, HMM, and Markov chains: Rollout variants for most likely sequence generation

Lecture 8, 2025; GPT, HMM, and Markov chains: Rollout variants for most likely sequence generation

Reinforcement Learning, Model Predictive Control, and the Newton Step for Solving Bellman's Equation

Reinforcement Learning, Model Predictive Control, and the Newton Step for Solving Bellman's Equation

Урганта спустили с небес на землю

Урганта спустили с небес на землю

4 Hours Chopin for Studying, Concentration & Relaxation

4 Hours Chopin for Studying, Concentration & Relaxation

The Internet, Reinvented.

The Internet, Reinvented.

Мировая роль евреев. Что связывает файлы Эпштейна и иранский вопрос? Дело принца Эндрю. Шевченко

Мировая роль евреев. Что связывает файлы Эпштейна и иранский вопрос? Дело принца Эндрю. Шевченко

Лучший Гайд по Kafka для Начинающих За 1 Час

Лучший Гайд по Kafka для Начинающих За 1 Час

Lecture 12 2024; Off-line training with neural nets for approximate VI and PI. Aggregation

Lecture 12 2024; Off-line training with neural nets for approximate VI and PI. Aggregation

Вот как читать дифференциальные уравнения.

Вот как читать дифференциальные уравнения.

История C# и TypeScript с Андерсом Хейлсбергом | GitHub

История C# и TypeScript с Андерсом Хейлсбергом | GitHub

РОМАНОВА: "Внезапно. Вы удивитесь". На что Кремль дал добро, жена генерала, эксперимент Путина, СИЗО

Lec 01. Introduction to Deep Learning

Lec 01. Introduction to Deep Learning

Четыре коротких увлекательных фильма о физике и математике

Четыре коротких увлекательных фильма о физике и математике

Abstract Dynamic Programming, Reinforcement Learning, Newton's Method, and Gradient Optimization

Abstract Dynamic Programming, Reinforcement Learning, Newton's Method, and Gradient Optimization

Największe tajemnice wszechświata - Dr Tomasz Miller, didaskalia#177

Największe tajemnice wszechświata - Dr Tomasz Miller, didaskalia#177

Алгоритмы и структуры данных ФУНДАМЕНТАЛЬНЫЙ КУРС от А до Я. Графы, деревья, хеш таблицы и тд

Алгоритмы и структуры данных ФУНДАМЕНТАЛЬНЫЙ КУРС от А до Я. Графы, деревья, хеш таблицы и тд

Lecture 6, 2025, Multistep Approximation in Value Space, Constrained Rollout, Multiagent Rollout

Lecture 6, 2025, Multistep Approximation in Value Space, Constrained Rollout, Multiagent Rollout

Quantum Computing Day: Introduction to Quantum Computing

Quantum Computing Day: Introduction to Quantum Computing

LIDS@80: Honoring Dimitri Bertsekas

LIDS@80: Honoring Dimitri Bertsekas

Lecture 1, 2024, course overview: RL and DP, AlphaZero, discrete and continuous applications

Lecture 1, 2024, course overview: RL and DP, AlphaZero, discrete and continuous applications