Abstract
We propose a novel algorithmic framework for distributional reinforcement learn-ing, based on learning finite-dimensional mean embeddings of return distribu-tions. We derive several new algorithms for dynamic programming and temporal-difference learning based on this framework, provide asymptotic convergence the-ory, and examine the empirical performance of the algorithms on a suite of tabulartasks. Further, we show that this approach can be straightforwardly combinedwith deep reinforcement learning, and obtain a new deep RL agent that improvesover baseline distributional approaches on the Arcade Learning Environment.
Authors
Li Kevin Wenliang, Grégoire Delétang, Matthew Aitchison, Marcus Hutter, Anian Ruoss, Arthur Gretton, Mark Rowland
Venue
arXiv