Jump to Content

Research

AlphaFold reveals the structure of the protein universe

Published
Authors

Demis Hassabis

Read about solving protein folding at deepmind.com/AlphaFold and see a timeline of our breakthrough here.

It’s been one year since we released and open sourced AlphaFold, our AI system to predict the 3D structure of a protein just from its 1D amino acid sequence, and created the AlphaFold Protein Structure Database (AlphaFold DB) to freely share this scientific knowledge with the world. Proteins are the building blocks of life, they underpin every biological process in every living thing. And, because a protein’s shape is closely linked with its function, knowing a protein’s structure unlocks a greater understanding of what it does and how it works. We hoped this groundbreaking resource would help accelerate scientific research and discovery globally, and that other teams could learn from and build on the advances we made with AlphaFold to create further breakthroughs. That hope has become a reality far quicker than we had dared to dream. Just twelve months later, AlphaFold has been accessed by more than half a million researchers and used to accelerate progress on important real-world problems ranging from plastic pollution to antibiotic resistance.

Today, I’m incredibly excited to share the next stage of this journey. In partnership with EMBL’s European Bioinformatics Institute (EMBL-EBI), we’re now releasing predicted structures for nearly all catalogued proteins known to science, which will expand the AlphaFold DB by over 200x - from nearly 1 million structures to over 200 million structures - with the potential to dramatically increase our understanding of biology.

A proportional circle chart showing the number of protein structures. AlphaFold DB today in dark blue forms the largest circle, representing over 200 million structures. A smaller light blue circle towards the bottom of the main circle represents AlphaFold DB previously, which was around 1 million structures. Within that circle, a smaller purple one represents Experimental (PDB) today, made up of 190,000 structures.

This update includes predicted structures for plants, bacteria, animals, and other organisms, opening up many new opportunities for researchers to use AlphaFold to advance their work on important issues, including sustainability, food insecurity, and neglected diseases.

Today’s update means that most pages on the main protein database UniProt will come with a predicted structure. All 200+ million structures will also be available for bulk download via Google Cloud Public Datasets, making AlphaFold even more accessible to scientists around the world.

AlphaFold is the singular and momentous advance in life science that demonstrates the power of AI. Determining the 3D structure of a protein used to take many months or years, it now takes seconds. AlphaFold has already accelerated and enabled massive discoveries, including cracking the structure of the nuclear pore complex. And with this new addition of structures illuminating nearly the entire protein universe, we can expect more biological mysteries to be solved each day.

Eric Topol
Founder and Director of the Scripps Research Translational Institute

AlphaFold’s impact so far

Twelve months on from AlphaFold’s initial release, it’s been amazing to reflect on the incredible impact AlphaFold has already had, and our long journey to reach today’s milestone.

For our team, AlphaFold’s success was especially rewarding, both because it was the most complex AI system we’d ever built, requiring multiple critical innovations, and because it has had the most meaningful downstream impact. By demonstrating that AI could accurately predict the shape of a protein down to atomic accuracy, at scale and in minutes, AlphaFold not only provided a solution to a 50-year grand challenge, it also became the first big proof point of our founding thesis: that artificial intelligence can dramatically accelerate scientific discovery, and in turn advance humanity.

We open sourced AlphaFold’s code and published two in-depth papers in Nature [1, 2], which have already been cited more than 4000 times. We collaborated closely with the world-leading EMBL-EBI to design a tool that would best help biologists access and use AlphaFold, and together released the AlphaFold DB, a searchable database that is open and free to all. Before releasing AlphaFold, in line with our careful approach to pioneering responsibly, we sought input from more than 30 experts across biology research, security, ethics and safety to help us understand how to share the benefits of AlphaFold with the world, in a way that would maximise potential benefit and minimise potential risk.

To date, more than 500,000 researchers from 190 countries have accessed the AlphaFold DB to view over 2 million structures. Our freely available structures have also been integrated into other public datasets, such as Ensembl, UniProt, and OpenTargets, where millions of users access them as part of their everyday workflows.

Six designs of AlphaFold predictions referenced in publications. Each one has a short description below it. They are as follows:

Nuclear pore complex protein Nup205.
Part of a large complex that acts as a gateway in and out of the cell nucleus. 

Gametocyte surface protein P45/48.
From the malaria parasite; a candidate protein for including in vaccines.

CCR4–NOT transcription complex subunit 9.
Regulates an important cellular process (the rate of MRNA degradation).

Ice nucleation protein.
Bacterial protein that can trigger ice formation at relatively high temperatures, causing frost damage to plants.

F2OH23.2 protein.
Plant protein; represents a potential new structural superfamily unlike anything seen before. 

Vitellogenin.
Involved in the immune system of egg-laying animals including honeybees.

We’ve been amazed by the rate at which AlphaFold has already become an essential tool for hundreds of thousands of scientists in labs and universities across the world to help them in their important work. As for our own work with AlphaFold, we prioritised applications that we felt would have the most positive social benefit, with a focus on initiatives that had been historically underfunded or overlooked. For example, we partnered with the Drugs for Neglected Diseases initiative (DNDi) to help advance their research, moving them closer to finding life-saving cures for diseases like Leishmaniasis and Chagas disease that disproportionately affect people in poorer parts of the world. We also supported World Neglected Tropical Disease Day by creating structure predictions for organisms identified by the World Health Organisation as high-priority for their research, helping to further the study of diseases like Leprosy and Schistosomiasis, which devastate the lives of more than 1 billion people globally.

It’s been so inspiring to see the myriad ways the research community has taken AlphaFold, using it for everything from understanding diseases, to protecting honey bees, to deciphering biological puzzles, to looking deeper into the origins of life itself.

Watch

Other impressive examples, chosen by members of our AlphaFold team, include:

A biological jigsaw, chosen by Kathryn Tunyasuvunakool

In a recent special issue of Science, several groups described how AlphaFold helped them piece together the nuclear pore complex, one of the most fiendish puzzles in biology. The giant structure consists of hundreds of protein parts and controls everything that goes in and comes out of the cell nucleus. Its delicate structure was finally revealed by using existing experimental methods to reveal its outline and AlphaFold predictions to complete and interpret any areas that were unclear. This powerful combination is now becoming routine in labs, unlocking new science and showing how experimental and computational techniques can work together.

A new world of bioinformatics, chosen by Richard Evans

Structural search tools like Foldseek and Dali are allowing users to very quickly search for entries similar to a given protein. This could be a first step toward mining large sequence datasets for practically useful proteins, such as those that break down plastic, and it could provide clues about protein function. The update of the database to include over 200 million predicted structures will further amplify this impact.

Direct impact on human health, chosen by John Jumper

AlphaFold is already having a significant, direct impact on human health. Meeting with researchers at the European Society of Human Genetics revealed how important AlphaFold structures are to biologists and clinicians trying to unravel the causes of rare genetic diseases. In addition, AlphaFold is accelerating drug discovery by providing a better understanding of newly identified proteins that could be drug targets, and helping scientists to more quickly find potential medicines that bind to them.

AlphaFold became an essential tool for biopharma research nearly overnight, including here at ROME Therapeutics where it is allowing us to predict protein structures in areas of the dark genome that have never been solved for before. AlphaFold speed and accuracy is accelerating the drug discovery process, and we’re only at the beginning of realising its impact on getting novel medicines to patients faster.

Rosana Kapeller, President & CEO of ROME Therapeutics and former CSO of Nimbus Therapeutics

Just the beginning

AlphaFold has launched biology into an era of structural abundance, unlocking scientific exploration at digital speed. The AlphaFold DB serves as a ‘google search’ for protein structures, providing researchers with instant access to predicted models of the proteins they're studying, enabling them to focus their effort and expedite experimental work. From fighting disease to developing vaccines, AlphaFold has already enabled incredible advances on some of our biggest global challenges, and this is just the beginning of the impact that we will start to see over the next few years. Our hope is that this expanded database will aid countless more scientists in their work and open up completely new avenues of scientific exploration, such as metaproteomics.

At DeepMind, we’re hard at work building on all this potential with significant investments in many areas, including partnering with our new sister Alphabet company Isomorphic Labs to reimagine the entire drug discovery process from first principles with an AI-first approach; establishing a wet lab at the renowned Francis Crick Institute to strengthen the connection between AI and experimental techniques to advance understanding of biology, including protein design and genomics; and expanding our AI for Science team to accelerate further progress on our fundamental biology research and apply AI to other fascinating and important scientific challenges, such as climate science, quantum chemistry, and fusion.

AlphaFold is a glimpse of the future, and what might be possible with computational and AI methods applied to biology. At its most fundamental level, biology can be thought of as an information processing system, albeit an extraordinarily complex and emergent one. Just as maths is the perfect description language for physics, we believe AI might turn out to be just the right technique to cope with the dynamic complexity of biology. AlphaFold is an important first proof point for this, and a sign of much more to come. As pioneers in the emerging field of ‘digital biology', we’re excited to see the huge potential of AI starting to be realised as one of humanity’s most useful tools for advancing scientific discovery and understanding the fundamental mechanisms of life.