Abstract
Progress in fields of machine learning and adversarial planning has benefited significantly from bench-mark domains, from Checkers and Chess, the classic UCI data sets and Netflix challenge to BLEU,Atari, Go, Poker, Starcraft, Dota2, and Diplomacy. In sequential decision-making, agent eval-uation has largely been restricted to very few interactions against experts, declaring victory upon reaching some desired level of performance (e.g.human professional). In this paper, we propose a benchmark for multiagent learning based on repeated play of the simple game Rock, Paper, Scissors along with a population of forty-three tournament entries, some of which are (intentionally)sub-optimal. We describe metrics to measure the quality of agents based both on average returns and exploitability. We then show that several re-cent RL and online learning approaches can learng ood counter-strategies and generalize well, but ultimately lose to the top-performing bots, creating an opportunity for research in multiagent learning.
Authors
Marc Lanctot, John Schultz, Neil Burch, Max Olan Smith, Daniel Hennes, Thomas Anthony, Julien Perolat
Venue
Transactions on Machine Learning Research (TMLR)