AlphaZero and MuZero
AlphaZero and MuZero are powerful, general AI systems, that mastered a range of board games and video games — and are now helping us solve real-world problems.
AlphaZero: A dynamic and creative player
AlphaZero represents a crucial step towards creating more general systems. It taught itself, from scratch, to master the board games of chess, shogi, and Go. In doing so, it became the strongest player in history for each.
The system is the successor to AlphaGo, the first AI to defeat a professional human Go player and one that inspired a new era of AI advances.
Unlike AlphaGo, which learned to play Go by analyzing millions of moves from amateur games, AlphaZero’s neural network was only given the rules of each game.
It then learned each game by playing itself millions of times. Through a process of trial and error, called reinforcement learning, the system learned to select the most promising moves and boost its chances of winning.
AlphaZero mastered chess in just 9 hours. Shogi in 12 hours. And Go in 13 days. In each game, it learned to play with a unique and creative style.
In chess, for example, the model developed a highly dynamic and “unconventional” playing style, which has since been studied at the highest levels of the game.
MuZero: AI that can plan
MuZero goes one step further than AlphaZero.
Without being told the rules of any game, MuZero matches AlphaZero’s level of performance in Go, chess and shogi, and also learns to master a suite of visually complex Atari games.
It does this by learning a model of its environment, such as the game it's playing. MuZero then uses that model to plan the best course of action.
Crucially, it only models three aspects of its environment that are important to its decision-making process - how good is the current position? Which action is the best to take? And how good was the last action?
These are all learned using a deep neural network and are all that is needed for MuZero to understand what happens when it takes an action and to plan accordingly.
MuZero’s ability to plan solves a long-standing research problem and is a significant step forward for AI systems that will need to solve complex, real-world problems.
Proving AI’s potential
Mastering games was only ever a proof of principle for AlphaZero and MuZero.
Both of these systems demonstrate that a single algorithm can learn how to discover new knowledge in a range of settings.
And both are crucial steps to creating general and capable AI systems that can solve a wide range of real-world problems.
For example, new versions of AlphaZero have discovered faster sorting, hashing, and matrix multiplication algorithms, which are now used trillions of times a day across the world.
Meanwhile, MuZero is helping to more efficiently compress YouTube videos, reducing internet traffic and delivering millions of hours of content more efficiently every day. That means better access to information and entertainment for billions of people.
And that’s just the beginning. Both are paving the way towards tackling new challenges in robotics, industrial systems, and other messy real-world environments where the “rules of the game” are not known.