In December 2020, when the AI ​​tool predicting the formation of proteins tackled a 50-year grand challenge with Alphafold, Deepmind took the world of biology by surprise. Last week, the London-based company released full details of the tool and its source code.

Now the firm has announced that it has used its AI to predict the shapes of almost every protein in the human body, as well as hundreds of thousands of other protein shapes found in the 20 most studied organisms, including yeast, fruit fly, and mouse. With this progress, biologists around the world can better understand diseases and develop new drugs.

So far Trove has contained 350,000 new predicted protein structures. Deepmind says it will predict and publish structures for more than 100 million in the next few months – more or less all proteins to science.

“Protein folding is a problem I’ve been looking at for over 20 years,” says Demis Hasabis, co-founder of Deepmind. “It simply came to our notice then. I would say this is the biggest thing we have ever done. And it’s in a way the most exciting, because AI should have the biggest impact in the outside world. “

Proteins are made up of long ribbons of amino acids, which turn themselves into complex nodules. Knowing the shape of a protein tumor is crucial to understanding what that protein does, how diseases work, and developing new drugs – or to identify organisms that can help prevent pollution and climate change. Finding the shape of a protein takes weeks or months in the lab; Alphafold can predict nearby molecular shapes in a day or two.

The new database should make life easier for biologists. Alphafold may be available for researchers to use, but not everyone will want to run the software manually. “It’s much easier to capture a structure from a database than to run it on your own computer,” says David Baker, of the Institute for Protein Design at Washington University in Washington. “To predict the protein structure, called rosetteafold and based on alphafold’s approach.

Over the past few months Baker’s team has been working with biologists who have previously stopped trying to find the shape of the protein they were studying. “There’s a lot of great biological research that has really gained momentum,” he says. A public database with hundreds of thousands of ready-made protein shapes should be an even bigger accelerator.

“It sounds surprisingly impressive,” says Tom Ellis, an artificial biologist at Imperial College Ledge London, who is excited to try the database. But it warns that most of the projected shapes have not yet been tested in the lab.

In the new version of Alphafold, the predictions come with a confidence score that the tool uses for flag saluting to see how close the shape of each prediction looks to the real thing. Using this measure, Deepmind found that Alphafold made predictions for 36% of human proteins with figure accuracy, appropriate to the level of individual molecules. This is good enough for drug development, says Hasabis.

Previously, after working for decades, only 17% of the proteins in the human body were recognized in the laboratory for their compositions. If Alphafold’s predictions are as accurate as Deepmind says, the tool has more than doubled that number in just a few weeks.

Predictions that are not completely accurate at the atomic level are still useful. For more than half the protein in the human body, Alphafold has a predictive shape that researchers say should be good enough to figure out the performance of the protein. The remaining current predictions of Alphafold are either incorrect, or for a third protein in the human body that is not formed until it binds to others. “They’re floppy,” says Hasabis.

“The fact that it can be used at this level of quality is impressive,” said Mohamed Alkuraish, a systems biologist at Columbia University who developed his own software to predict protein structures. He also said that since organisms have structures for most proteins, it would be possible to study how these proteins function not just in isolation, but as a system. “That’s what makes me think it’s so exciting,” he says.

Deepmind is releasing its tools and predictions free of charge and will not say if it plans to make money from them in the future. However, he did not rule out the possibility. To set up and run the database, Deepmind is partnering with the European Molecular Biology Laboratory, an international research organization that already hosts a vast database of protein information.

So far, al-Quraishi researchers have not waited to see what they do with the new data. “It’s so wonderful,” he says, “I don’t think any of us thought we’d get here this fast. It’s a boggle in mind.”