Highly accurate protein structure prediction with AlphaFold¶

Why this mattered¶

AlphaFold changed protein structure prediction from a specialist, often unreliable modeling problem into a broadly usable source of atomic-scale structural hypotheses. In CASP14, the system reached accuracy competitive with experimental structures for a majority of targets, which marked a discontinuity in a field that had been benchmarked for decades against the same problem: inferring three-dimensional fold from amino-acid sequence. The key shift was not only higher accuracy, but calibrated usefulness: AlphaFold’s confidence estimates let researchers distinguish regions likely to be structurally reliable from regions that remained uncertain, making predictions actionable rather than merely suggestive. See the Nature paper.

After this paper, structure became available for many proteins for which crystallography, NMR, or cryo-EM data were absent, slow to obtain, or experimentally difficult. That changed everyday biological work: researchers could map mutations, infer active or binding sites, guide construct design, interpret domains of unknown function, and prioritize experiments using models that were often good enough to reason from, while still requiring experimental validation for dynamics, ligands, complexes, disorder, and conformational change. The immediate follow-on was proteome-scale prediction, including the AlphaFold human proteome paper and the AlphaFold Protein Structure Database, which expanded structural coverage from a scarce experimental resource into a near-default annotation layer for sequence databases.

Its longer-term importance was that it made learned structural biology a foundation for later systems, rather than a demonstration. AlphaFold-Multimer, RoseTTAFold-family methods, large protein language models, generative protein design tools, and AlphaFold 3’s biomolecular interaction predictions all built on the same post-2021 premise: that deep networks trained on evolutionary and structural data can model biological molecules well enough to reshape discovery workflows. The paper did not eliminate experimental structural biology, but it reset its role, turning many structure determinations from first glimpses into tests, refinements, and context-specific validations of powerful computational priors.

Abstract¶

Abstract Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort 1–4 , the structures of around 100,000 unique proteins have been determined 5 , but this represents a small fraction of the billions of known protein sequences 6,7 . Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure. Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics. Predicting the three-dimensional structure that a protein will adopt based solely on its amino acid sequence—the structure prediction component of the ‘protein folding problem’ 8 —has been an important open research problem for more than 50 years 9 . Despite recent progress 10–14 , existing methods fall far short of atomic accuracy, especially when no homologous structure is available. Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known. We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14) 15 , demonstrating accuracy competitive with experimental structures in a majority of cases and greatly outperforming other methods. Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm.

cite → Accelerated Profile HMM Searches — AlphaFold uses fast profile-HMM sequence-search tools such as HMMER to construct multiple sequence alignments for protein structure prediction.
cite → Deep Residual Learning for Image Recognition — AlphaFold's neural architecture uses residual connections popularized by ResNet to train deep networks for protein-structure inference.
cite ← AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models — The AlphaFold database paper uses AlphaFold's high-accuracy structure-prediction method to populate predicted protein structures at proteome scale.
cite ← ColabFold: making protein folding accessible to all — ColabFold makes AlphaFold2-style highly accurate protein-structure prediction accessible through faster public MSA and inference workflows.
enables ← Accelerated Profile HMM Searches — Accelerated profile-HMM searches enabled AlphaFold's use of deep multiple-sequence alignments to extract evolutionary constraints for protein-structure prediction.
enables ← Deep Residual Learning for Image Recognition — Residual networks enabled AlphaFold's very deep neural architectures to propagate pairwise and spatial protein-structure features without optimization collapse.

Sources¶

DOI: https://doi.org/10.1038/s41586-021-03819-2
OpenAlex: https://openalex.org/W3177828909

Highly accurate protein structure prediction with AlphaFold¶

Why this mattered¶

Abstract¶

Related¶

Sources¶