DeepMind and Blizzard open StarCraft II as an AI research environment

DeepMind's scientific mission is to push the boundaries of AI by developing systems that can learn to solve complex problems. To do this, we design agents and test their ability in a wide range of environments from the purpose-built DeepMind Lab to established games, such as Atari and Go.

Testing our agents in games that are not specifically designed for AI research, and where humans play well, is crucial to benchmark agent performance. That is why we, along with our partner Blizzard Entertainment, are excited to announce the release of SC2LE, a set of tools that we hope will accelerate AI research in the real-time strategy game StarCraft II. The SC2LE release includes:

A Machine Learning API developed by Blizzard that gives researchers and developers hooks into the game. This includes the release of tools for Linux for the first time.
A dataset of anonymised game replays, which will increase from 65k to more than half a million in the coming weeks.
An open source version of DeepMind’s toolset, PySC2, to allow researchers to easily use Blizzard’s feature-layer API with their agents.
A series of simple RL mini-games to allow researchers to test the performance of agents on specific tasks.
A joint paper that outlines the environment, and reports initial baseline results on the mini-games, supervised learning from replays, and the full 1v1 ladder game against the built-in AI.

StarCraft II is a science-fiction based real-time strategy game, released in 2010

StarCraft and StarCraft II are among the biggest and most successful games of all time, with players competing in tournaments for more than 20 years. The original game is also already used by AI and ML researchers, who compete annually in the AIIDE bot competition. Part of StarCraft’s longevity is down to the rich, multi-layered gameplay, which also makes it an ideal environment for AI research.

For example, while the objective of the game is to beat the opponent, the player must also carry out and balance a number of sub-goals, such as gathering resources or building structures. In addition, a game can take from a few minutes to one hour to complete, meaning actions taken early in the game may not pay-off for a long time. Finally, the map is only partially observed, meaning agents must use a combination of memory and planning to succeed.

The game also has other qualities that appeal to researchers, such as the large pool of avid players that compete online every day. This ensures that there is a large quantity of replay data to learn from - as well as a large quantity of extremely talented opponents for AI agents.

Even StarCraft’s action space presents a challenge with a choice of more than 300 basic actions that can be taken. Contrast this with Atari games, which only have about 10 (e.g. up, down, left, right etc). On top of this, actions in StarCraft are hierarchical, can be modified and augmented, with many of them requiring a point on the screen. Even assuming a small screen size of 84x84 there are roughly 100 million possible actions available.

Actions available to both humans and agents depend on the units selected

This release means researchers can now tackle some of these challenges using Blizzard’s own tools to build their own tasks and models.

Our PySC2 environment wrapper helps by offering a flexible and easy-to-use interface for RL agents to play the game. In this initial release, we break the game down into “feature layers”, where elements of the game such as unit type, health and map visibility are isolated from each other, whilst preserving the core visual and spatial elements of the game.

The release also contains a series of ‘mini-games’ - an established technique for breaking down the game into manageable chunks that can be used to test agents on specific tasks, such as moving the camera, collecting mineral shards or selecting units. We hope that researchers can test their techniques on these as well as propose new mini-games for other researchers to compete and evaluate on.

Our initial investigations show that our agents perform well on these mini-games. But when it comes to the full game, even strong baseline agents, such as A3C, cannot win a single game against even the easiest built-in AI. For instance, the following video shows an early-stage training agent (left) which fails to keep its workers mining, a task that humans find trivial. After training (right), the agents perform more meaningful actions, but if they are to be competitive, we will need further breakthroughs in deep RL and related areas.

One technique that we know allows our agents to learn stronger policies is imitation learning. This kind of training will soon be far easier thanks to Blizzard, which has committed to ongoing releases of hundreds of thousands of anonymised replays gathered from the StarCraft II ladder. These will not only allow researchers to train supervised agents to play the game, but also opens up other interesting areas of research such as sequence prediction and long-term memory.

Our hope is that the release of these new tools will build on the work that the AI community has already done in StarCraft, encouraging more DeepRL research and making it easier for researchers to focus on the frontiers of our field.

We look forward to seeing what the community discovers.

Notes