Probabilistic Inference in Reinforcement Learning Done Right

Published: 10 December 2023

Abstract

In Reinforcement learning (RL) an agent acts so as to maximize its return under uncertainty. It is natural to apply Bayesian probabilistic inference to the uncertain parameters and, since the goal of the agent is to find the optimal policy, a relevant object of study is the posterior probability of optimality for each state-action. Previous work on `RL as inference' has equipped the agent with a surrogate potential in order to estimate this quantity, however the approximation can be arbitrarily poor, leading to algorithms that do not perform well in practice. In this work, we rigorously analyze how the posterior probability of optimality flows through the Markov decision process (MDP) and show that sampling according to this probability yields a guaranteed Bayesian regret bound. In practice computing this probability is intractable, so we derive a variational Bayesian approximation yielding a tractable convex optimization problem and show that the resulting policy also satisfies a Bayesian regret bound. We call our approach VAPOR and show that it has deep connections to Thompson sampling, K-learning, information theory, and maximum entropy exploration.

Authors

Jean Tarbouriech, Tor Lattimore, Brendan O'Donoghue

Venue

NeurIPS 2023

Gemini

Gemma

Generative models

Experiments

Projects

Publications

News

AI for biology

AI for climate and sustainability

AI for mathematics and computer science

AI for physics and chemistry

AI transparency

News

Careers

Milestones

Education

Responsibility

The Podcast

Probabilistic Inference in Reinforcement Learning Done Right

Abstract

Authors

Venue

Probabilistic Inference in Reinforcement Learning Done Right

Share

Abstract

Authors

Venue