Jump to Content


Population based training of neural networks


Max Jaderberg

Neural networks have shown great success in everything from playing Go and Atari games to image recognition and language translation. But often overlooked is that the success of a neural network at a particular application is often determined by a series of choices made at the start of the research, including what type of network to use and the data and method used to train it. Currently, these choices - known as hyperparameters - are chosen through experience, random search or a computationally intensive search processes.

In our most recent paper, we introduce a new method for training neural networks which allows an experimenter to quickly choose the best set of hyperparameters and model for the task. This technique - known as Population Based Training (PBT) - trains and optimises a series of networks at the same time, allowing the optimal set-up to be quickly found. Crucially, this adds no computational overhead, can be done as quickly as traditional techniques and is easy to integrate into existing machine learning pipelines.

The technique is a hybrid of the two most commonly used methods for hyperparameter optimisation: random search and hand-tuning. In random search, a population of neural networks are trained independently in parallel and at the end of training the highest performing model is selected. Typically, this means that a small fraction of the population will be trained with good hyperparameters, but many more will be trained with bad ones, wasting computer resources.

Random search of hyperparameters, where many hyperparameters are tried in parallel, but independently. Some hyperparameters will lead to models with good performance, but others will not

With hand tuning, researchers must guess at the best hyperparameters, train their models using them, and then evaluate the performance. This is done over and over, until the researcher is happy with the performance of the network. Although this can result in better performance, the downside is that this takes a long time, sometimes taking weeks or even months to find the perfect set-up. And while there are ways of automating this process - such as Bayesian optimisation - it still takes a long time and requires many sequential training runs to find the best hyperparameters.

Methods like hand tuning and Bayesian Optimisation make changes to the hyperparameters by observing many training runs sequentially, making these methods slow

PBT - like random search - starts by training many neural networks in parallel with random hyperparameters. But instead of the networks training independently, it uses information from the rest of the population to refine the hyperparameters and direct computational resources to models which show promise. This takes its inspiration from genetic algorithms where each member of the population, known as a worker, can exploit information from the remainder of the population. For example, a worker might copy the model parameters from a better performing worker. It can also explore new hyperparameters by changing the current values randomly.
As the training of the population of neural networks progresses, this process of exploiting and exploring is performed periodically, ensuring that all the workers in the population have a good base level of performance and also that new hyperparameters are consistently explored. This means that PBT can quickly exploit good hyperparameters, can dedicate more training time to promising models and, crucially, can adapt the hyperparameter values throughout training, leading to automatic learning of the best configurations.

Population Based Training of neural networks starts like random search, but allows workers to exploit the partial results of other workers and explore new hyperparameters as training progresses

Our experiments show that PBT is very effective across a whole host of tasks and domains. For example, we rigorously tested the algorithm on a suite of challenging reinforcement learning problems with state-of-the-art methods on DeepMind Lab, Atari, and StarCraft II. In all cases, PBT stabilised training, quickly found good hyperparameters, and delivered results that were beyond state-of-the-art baselines.

We have also found PBT to be effective for training Generative Adversarial Network (GAN), which are notoriously difficult to tune. Specifically, we used the PBT framework to maximise the Inception Score - a measure of visual fidelity - resulting in a significant improvement from 6.45 to 6.9.

We have also applied it to one of Google’s state-of-the-art machine translation neural networks, which are usually trained with carefully hand tuned hyperparameter schedules that take months to perfect. With PBT we automatically found hyperparameter schedules that match and even exceed existing performance, but without any tuning and in the same time it normally takes to do a single training run.

The evolution of the population during training of GANs on CIFAR-10 and Feudal Networks (FuN) on Ms Pacman. Pink dots represent initial agents, blue ones the final ones.

We believe this is only the beginning for the technique. At DeepMind, we have also found PBT is particularly useful for training new algorithms and neural network architectures that introduce new hyperparameters. As we continue to refine the process, it offers up the possibility of finding and developing ever more sophisticated and powerful neural network models.