AlphaProteo generates novel proteins for biology and health research

New AI system designs proteins that successfully bind to target molecules, with potential for advancing drug design, disease understanding and more.

Every biological process in the body, from cell growth to immune responses, depends on interactions between molecules called proteins. Like a key to a lock, one protein can bind to another, helping regulate critical cellular processes. Protein structure prediction tools like AlphaFold have already given us tremendous insight into how proteins interact with each other to perform their functions, but these tools cannot create new proteins to directly manipulate those interactions.

Scientists, however, can create novel proteins that successfully bind to target molecules. These binders can help researchers accelerate progress across a broad spectrum of research, including drug development, cell and tissue imaging, disease understanding and diagnosis – even crop resistance to pests. While recent machine learning approaches to protein design have made great strides, the process is still laborious and requires extensive experimental testing.

Today, we introduce AlphaProteo, our first AI system for designing novel, high-strength protein binders to serve as building blocks for biological and health research. This technology has the potential to accelerate our understanding of biological processes, and aid the discovery of new drugs, the development of biosensors and more.

AlphaProteo can generate new protein binders for diverse target proteins, including VEGF-A, which is associated with cancer and complications from diabetes. This is the first time an AI tool has been able to design a successful protein binder for VEGF-A.

AlphaProteo also achieves higher experimental success rates and 3 to 300 times better binding affinities than the best existing methods on seven target proteins we tested.

Learning the intricate ways proteins bind to each other

Protein binders that can bind tightly to a target protein are hard to design. Traditional methods are time intensive, requiring multiple rounds of extensive lab work. After the binders are created, they undergo additional experimental rounds to optimize binding affinity, so they bind tightly enough to be useful.

Trained on large amounts of protein data from the Protein Data Bank (PDB) and more than 100 million predicted structures from AlphaFold, AlphaProteo has learned the myriad ways molecules bind to each other. Given the structure of a target molecule and a set of preferred binding locations on that molecule, AlphaProteo generates a candidate protein that binds to the target at those locations.

Illustration of a predicted protein binder structure interacting with a target protein. Shown in blue is a predicted protein binder structure generated by AlphaProteo, designed for binding to a target protein. Shown in yellow is the target protein, specifically the SARS-CoV-2 spike receptor-binding domain

Demonstrating success on important protein binding targets

To test AlphaProteo, we designed binders for diverse target proteins, including two viral proteins involved in infection, BHRF1 and SARS-CoV-2 spike protein receptor-binding domain, SC2RBD, and five proteins involved in cancer, inflammation and autoimmune diseases, IL-7Rɑ, PD-L1, TrkA, IL-17A and VEGF-A.

Our system has highly-competitive binding success rates and best-in-class binding strengths. For seven targets, AlphaProteo generated candidate proteins in-silico that bound strongly to their intended proteins when tested experimentally.

A grid of illustrations of predicted structures of seven target proteins for which AlphaProteo generated successful binders. Shown in blue are examples of binders tested in the wet lab, shown in yellow are protein targets, and highlighted in dark yellow are intended binding regions.

For one particular target, the viral protein BHRF1, 88% of our candidate molecules bound successfully when tested in the Google DeepMind Wet Lab. Based on the targets tested, AlphaProteo binders also bind 10 times more strongly, on average, than the best existing design methods.

For another target, TrkA, our binders are even stronger than the best prior designed binders to this target that have been through multiple rounds of experimental optimization.

A blue and gray bar graph showing experimental in vitro success rates of AlphaProteo’s output for each of the seven target proteins, compared to other design methods. Higher success rates mean fewer designs must be tested to find successful binders. — Bar graph showing experimental in vitro success rates of AlphaProteo’s output for each of the seven target proteins, compared to other design methods. Higher success rates mean fewer designs must be tested to find successful binders.

A blue and gray bar graph showing the best affinity for AlphaProteo’s designs without experimental optimization for each of the seven target proteins, compared to other design methods. Lower affinity means the binder protein binds more tightly to the target protein. — Bar graph showing the best affinity for AlphaProteo’s designs without experimental optimization for each of the seven target proteins, compared to other design methods. Lower affinity means the binder protein binds more tightly to the target protein. Please note the logarithmic scale of the vertical axis.

Validating our results

Beyond in silico validation and testing AlphaProteo in our wet lab, we engaged Peter Cherepanov’s, Katie Bentley’s and David LV Bauer’s research groups from the Francis Crick Institute to validate our protein binders. Across different experiments, they dived deeper into some of our stronger SC2RBD and VEGF-A binders. The research groups confirmed that the binding interactions of these binders were indeed similar to what AlphaProteo had predicted. Additionally, the groups confirmed that the binders have useful biological function. For example, some of our SC2RBD binders were shown to prevent SARS-CoV-2 and some of its variants from infecting cells.

AlphaProteo’s performance indicates that it could drastically reduce the time needed for initial experiments involving protein binders for a broad range of applications. However, we know that our AI system has limitations, as it was unable to design successful binders against an 8th target, TNFɑ, a protein associated with autoimmune diseases like rheumatoid arthritis. We selected TNFɑ to robustly challenge AlphaProteo, as computational analysis showed that it would be extremely difficult to design binders against. We will continue to improve and expand AlphaProteo's capabilities with the goal of eventually addressing such challenging targets.

Achieving strong binding is usually only the first step in designing proteins that might be useful for practical applications, and there are many more bioengineering obstacles to overcome in the research and development process.

Towards responsible development of protein design

Protein design is a fast-evolving technology that holds lots of potential for advancing science in everything from understanding the factors that cause disease, to accelerating diagnostic test development for virus outbreaks, supporting more sustainable manufacturing processes, and even cleaning contaminants from the environment.

To account for potential risks in biosecurity, building on our long-standing approach to responsibility and safety, we’re working with leading external experts to inform our phased approach to sharing this work, and feeding into community efforts to develop best practices, including the NTI’s (Nuclear Threat Initiative) new AI Bio Forum.

Going forward, we’ll be working with the scientific community to leverage AlphaProteo on impactful biology problems and understand its limitations. We've also been exploring its drug design applications at Isomorphic Labs, and are excited for what the future holds.

At the same time, we’re continuing to improve the success rate and affinity of AlphaProteo’s algorithms, expanding the range of design problems it can tackle, and working with researchers in machine learning, structural biology, biochemistry and other disciplines to develop a responsible and more comprehensive protein design offering for the community.

Read our whitepaper

Acknowledgements

This research was co-developed by our Protein Design team and Wet Lab team.

We’d like to thank our collaborators Peter Cherepanov, David Bauer, Katie Bentley and their groups at the Francis Crick Institute for their invaluable experimental insights and results, the AlphaFold team, whose earlier work and algorithms provided training inputs and evaluation insights, and the many other teams across Google DeepMind who contributed to this program.