Jump to Content

Robust Exploration via Clustering-based Density Estimation

View publication Download


Intrinsic motivation is a critical ingredient in reinforcement learning to enable progress when rewards are sparse. However, many existing approaches that measure the novelty of observations are brittle, or rely on restrictive assumptions about the environment which limit generality. We propose to decompose the exploration problem into two orthogonal sub-problems: (i) finding the right representation (metric) for exploration (ii) estimating densities in this representation space.

To address (ii), we introduce Robust Exploration via Clustering-based Online Density Estimation (RECODE), a non-parametric method that estimates visitation counts for clusters of states that are similar according to the metric induced by any arbitrary representation learning technique. We adapt classical clustering algorithms to design a new type of memory that allows RECODE to keep track of the history of interactions over thousands of episodes, thus effectively tracking global visitation counts. This is in contrast to existing non-parametric approaches, that can only store the recent history, typically the current episode. %The resulting exploration bonus is simpler, more principled, and shown to be more robust than existing approaches on challenging tasks. The generality of RECODE allows us to easily address (i) by leveraging both off-the-shelf and novel representation learning techniques. In particular, we introduce a novel generalization of the action-prediction representation that leverages multi-step predictions and that we find to be better suited to a suite of challenging 3D-exploration tasks in \DMH.

We show experimentally that our approach can work with a variety of RL agents, and obtain state-of-the-art performance on Atari and \DMH.


Alaa Saade, Steven Kapturowski, Daniele Calandriello, Charles Blundell, Pablo Sprechmann, Leopoldo Sarra*, Oliver Groth, Bilal Piot, Michal Valko


ICLR 2024