When AlphaFold first came out, it debuted with the promise of increasing the number of protein structures that could be predicted, a move that had eluded researchers for decades and could prove to become a crucial step to moving drug development to a new height.
Now, AlphaFold says its database of protein structures has been massively expanded.
Google’s AI outfit and the European Molecular Biology Laboratory’s European Bioinformatics Institute (EMBL-EBI) announced Thursday that DeepMind’s Alphafold database now contains the structures of more than 200 million proteins. It’s a substantial jump from where it was a year ago when DeepMind announced that it had predicted the structure of only about 350,000 proteins.
And according to the pair, the range of scientific possibilities can go far beyond diseases and drug development, including sustainability and food insecurity.
The two companies said in a statement announcing the database expansion that it now contains the structure of essentially every protein that has been sequenced — and is designed to function essentially like a Google search.
On top of that: the companies are keeping it free for use for the scientific community at large.
“Our hope is that this expanded database will aid countless more scientists and their important work and open up completely new avenues of scientific discovery,” DeepMind CEO Demis Hassabis told reporters earlier this week.
Protein structure — and the ability to predict it — had historically been elusive to researchers. As a protein’s function is, for the most part, dictated by its structure, the end result was that people were left in the dark on what exactly certain proteins can and cannot do as it relates to human health and bodily function. Before DeepMind’s launch, experts estimated only a third of human protein structures were known for research, limiting drug development options.
For biologists, a database of this size could show new pockets/avenues for potential drug targets that can go after cancer tumors or correct gene mutations — or even better understand antibiotic resistance, for example.
Google's DeepMind opens its protein database to science — potentially cracking drug R&D wide open
1910 Genetics CEO Jen Nwankwo told Endpoints News that the database increase can be a game changer for certain targets in the drug discovery space. However, the question remains of how it will work in protein motion and more difficult/more elusive targets.
Here are her thoughts:
This latest update that expands it to over 200 million proteins is truly a heroic moment for science, and the open-source nature of it would enable even broader adoption of the tool. From our perspective, we think there’s two ways to go to improve the accuracy and expand the capability of AlphaFold 2. Because while it performs well in predicting the 3D structure of monomeric proteins, it still requires significant research and domain knowledge to apply it to complex structures like membrane proteins, allosteric regions, conformational dynamics, etc. But we suspect that for challenging disease targets, Alphafold perhaps isn’t the answer just yet.
And while the 200 million size is impressive, it’s important to note that the fine details, the small details are important.
Nwankwo elaborated that “beggars aren’t choosers,” reiterating that Google, from her view, is doing the entire world a big service by expanding the library. However, “It can get better to get us to some of these finer minutia around the biophysics of protein structure that we really need, to unlock some of these cryptic pockets for novel drug discovery,” the CEO further noted.
Exscientia’s Chris Radoux, an associate director in structural bioinformatics, agreed with Nwankwo, telling Endpoints in an email:
We no longer have to ask the question “Is there a structure?” but rather “How useful is the structure we have?” Right now, not every AF2 model has high enough confidence to be used in structure-based drug design. As a community, we can now focus on strategically solving experimental structures to feed AlphaFold2 the data it requires to predict all known protein structures with high confidence.
Radoux added that “This database…allows us to expand the known druggable genome through potentially revealing previously unknown drug binding sites — put another way, it could significantly expand the options scientists have to find new, novel medicines for previously unsolved medical challenges.”
DeepMind and the EBI also pointed out several case studies, with one, in particular, focusing on antibiotic resistance out of the University of Colorado, Boulder. According to the companies, two researchers had been trying to verify, without much luck, protein structures using crystallography, a method to define protein structures in an experimental setting. Part of the issue, according to the case study, was that no similar protein structure was available to be defined as a starting point.
After AlphaFold provided a model that crystallography could verify, the researchers “were able to identify a bacterial protein structure in half an hour that had been elusive for 10 years,” the companies said.
Eric Topol of Scripps Research said in a statement that the newest development with AlphaFold will now allow “more biological mysteries to be solved each day.”
By: Paul Schloesser
Read more here.