Riccardo Zamboni - Pure Exploration in POMDP: limits and possible solutions

Автор: RL and Agents Reading Group

Загружено: 2024-08-23

Просмотров: 136

Описание: UoE RL Reading Group | 22 August 2024

Speaker: Riccardo Zamboni (Politecnico di Milano)

Title: Pure Exploration in POMDP: limits and possible solutions

Abstract: The problem of pure exploration in MDPs has been cast as maximizing the entropy over the state distribution induced by the agent’s policy, an objective that has been extensively studied. However, little attention has been dedicated to state entropy maximization under partial observability, despite the latter being ubiquitous in applications, e.g., finance and robotics, in which the agent only receives noisy observations of the true state governing the system’s dynamics. How can we address state entropy maximization in those domains? In this talk, we first provide lower and upper bounds to the approximation of the true state entropy that only depend on some properties of the observation function. Then, we study the simple approach of maximizing the entropy over observations in place of true latent states and we show how knowledge of the latter can be exploited to compute a principled regularization of the observation entropy to improve performance. Finally, we briefly provide some insights on possible ways to pass over this approach and take into account beliefs over the latent states.

Link: https://arxiv.org/pdf/2406.12795

Bio: Riccardo is a PhD Student under the supervision of M. Restelli at Politecnico di Milano. His research focuses on developing principled algorithms to pass over current limitations in Multi-Agent RL.

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Riccardo Zamboni - Pure Exploration in POMDP: limits and possible solutions

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Theresa Eimer - Hyperparameters in RL

Theresa Eimer - Hyperparameters in RL

Lecture 4, 2024, POMDP, Systems with Changing Parameters, Adaptive Control, Model Predictive Control

Lecture 4, 2024, POMDP, Systems with Changing Parameters, Adaptive Control, Model Predictive Control

David Abel - A Definition of Continual Reinforcement Learning

David Abel - A Definition of Continual Reinforcement Learning

Музыка лечит сердце и сосуды🌸 Успокаивающая музыка восстанавливает нервную систему,расслабляющая

Музыка лечит сердце и сосуды🌸 Успокаивающая музыка восстанавливает нервную систему,расслабляющая

Ensemble reconstruction of the Worldwide Airport Network - Giulia Fischetti - Young Seminars SIFS

Ensemble reconstruction of the Worldwide Airport Network - Giulia Fischetti - Young Seminars SIFS

Дороничев: ИИ — пузырь, который скоро ЛОПНЕТ. Какие перемены ждут мир?

Дороничев: ИИ — пузырь, который скоро ЛОПНЕТ. Какие перемены ждут мир?

МФТИ: Кто создает будущее дронов?

МФТИ: Кто создает будущее дронов?

Audio and Speech Signal Processing

Audio and Speech Signal Processing

Claude Formanek - Dispelling the Mirage of Progress in Offline MARL through Standardise Baselines...

Claude Formanek - Dispelling the Mirage of Progress in Offline MARL through Standardise Baselines...

Москва без связи. Статус S09E27

Москва без связи. Статус S09E27

Музыка для глубокой работы ~ Атмосфера учебы на закате | Повышение продуктивности

Музыка для глубокой работы ~ Атмосфера учебы на закате | Повышение продуктивности

Samuel Garcin & Trevor McInroe - Studying the Interplay Between Actor / Critic Representations in RL

Samuel Garcin & Trevor McInroe - Studying the Interplay Between Actor / Critic Representations in RL

Yifan Zhong & Jiarong Liu - Maximum Entropy Heterogeneous-Agent Reinforcement Learning

Yifan Zhong & Jiarong Liu - Maximum Entropy Heterogeneous-Agent Reinforcement Learning

Россия победила в войне / Официальное заявление МИД

Россия победила в войне / Официальное заявление МИД

Cam Allen - The Agent Must Choose the Problem Model

Cam Allen - The Agent Must Choose the Problem Model

Нефть за 100: как война в Иране превращается в мировой экономический кризис

Нефть за 100: как война в Иране превращается в мировой экономический кризис

Музыка для работы - Deep Focus Mix для программирования, кодирования

Музыка для работы - Deep Focus Mix для программирования, кодирования

Lukas Schäfer - Ensemble Value Functions for Efficient Exploration in Multi-Agent RL

Lukas Schäfer - Ensemble Value Functions for Efficient Exploration in Multi-Agent RL

"Трамп попал в ту же западню, что и Путин": как война на Ближнем Востоке повлияет на нефтяной рынок

Joe Marino (Google DeepMind) - SIMA 2: A Generalist Embodied Agent for Virtual Worlds

Joe Marino (Google DeepMind) - SIMA 2: A Generalist Embodied Agent for Virtual Worlds