Jump to Content

Population-based Evaluation in Repeated Rock-Paper-Scissors as a Benchmark for Multiagent Reinforcement Learning

Published
View publication Download

Abstract

Progress in fields of machine learning and adversarial planning has benefited significantly from bench-mark domains, from Checkers and Chess, the classic UCI data sets and Netflix challenge to BLEU,Atari, Go, Poker, Starcraft, Dota2, and Diplomacy. In sequential decision-making, agent eval-uation has largely been restricted to very few interactions against experts, declaring victory upon reaching some desired level of performance (e.g.human professional). In this paper, we propose a benchmark for multiagent learning based on repeated play of the simple game Rock, Paper, Scissors along with a population of forty-three tournament entries, some of which are (intentionally)sub-optimal. We describe metrics to measure the quality of agents based both on average returns and exploitability. We then show that several re-cent RL and online learning approaches can learng ood counter-strategies and generalize well, but ultimately lose to the top-performing bots, creating an opportunity for research in multiagent learning.

Authors

Marc Lanctot, John Schultz, Neil Burch, Max Olan Smith, Daniel Hennes, Thomas Anthony, Julien Perolat

Venue

Transactions on Machine Learning Research (TMLR)