Progress in fields of machine learning and adversarial planning has benefited significantly from bench-mark domains, from Checkers and Chess, the classic UCI data sets and Netflix challenge to BLEU,Atari, Go, Poker, Starcraft, Dota2, and Diplomacy. In sequential decision-making, agent eval-uation has largely been restricted to very few interactions against experts, declaring victory upon reaching some desired level of performance (e.g.human professional). In this paper, we propose a benchmark for multiagent learning based on repeated play of the simple game Rock, Paper, Scissors along with a population of forty-three tournament entries, some of which are (intentionally)sub-optimal. We describe metrics to measure the quality of agents based both on average returns and exploitability. We then show that several re-cent RL and online learning approaches can learng ood counter-strategies and generalize well, but ultimately lose to the top-performing bots, creating an opportunity for research in multiagent learning.