Abstract
Replaying data is a principal mechanism underlying the stability and data effi-ciency of off-policy reinforcement learning (RL) experiments. We present aneffective yet simple framework to extend the use of replay data across experi-ments, minimally adapting the RL engineering workflow for sizeable improve-ments in controller performance. At its core, Replay across Experiments (RaE)involves reusing experience from previous experiments to improve exploration,bootstrap learning and ultimately obtain stronger performance. We empiricallydemonstrate the robustness and benefits of our approach on a number of RL algo-rithms and challenging control domains spanning both locomotion and manipu-lation including sparsely rewarded tasks with egocentric vision. Furthermore, wedemonstrate how RaE can be leveraged in settings with available existing offlinedatasets to achieve state-of-the-art performance. Finally, through various abla-tions we demonstrate the robustness of our approach to the underlying algorithm,quality and amount of data available and various hyperparameter choices.
Authors
Dhruva Tirumala, Thomas Lampe, Jose Enrique Chen, Tuomas Haarnoja, Sandy Huang, Guy Lever, Ben Moran, Tim Hertweck, Leonard Hasenclever, Martin Riedmiller, Nicolas Heess, Markus Wulfmeier
Venue
arXiv