Jump to Content

Partition Tree Weighting for Non-Stationary Stochastic Bandits

Published
View publication Download

Abstract

This paper considers a generalisation of universal source coding for interaction data, namely data streams that have actions interleaved with observations. Our goal will be to construct a coding distribution that is both universal and can be used as a control policy. Allowing for action generation needs careful treatment, as naive approaches which do not distinguish between actions and observations run into the self-delusion problem in universal settings. We showcase our perspective in the context of the challenging non-stationary stochastic bernoulli bandit problem. Our main contribution is an efficient and high performing algorithm for the non-stationary Stochastic bernoulli bandit problem that generalises the Partition Tree Weighting universal source coding technique for passive prediction to the control setting.

Authors

aixi , Marcus Hutter, Andras Gyorgy, Jordi Grau

Venue

arXiv