AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models¶
Why this mattered¶
AlphaFold DB mattered because it changed AlphaFold2 from a landmark method into shared scientific infrastructure. The AlphaFold2 paper showed that computational structure prediction could often approach experimental accuracy, but this database made that capability broadly usable by publishing predicted coordinates, confidence measures, and error estimates for whole proteomes rather than requiring each lab to run large-scale inference itself. In doing so, it shifted structural biology from a world where most protein sequences had no structural hypothesis to one where many proteins could be inspected, compared, and annotated immediately, with uncertainty made explicit through pLDDT and predicted aligned error.
The practical consequence was a major expansion of what could be asked at scale. Researchers could map disease variants onto plausible three-dimensional contexts, infer domain architecture, guide mutagenesis, prioritize constructs for crystallography or cryo-EM, and compare folds across organisms even when no experimental structure existed. The database did not replace experimental structure determination: flexible regions, complexes, ligands, alternate conformations, and low-confidence predictions still required care. But it changed experiments by making predicted structure a default starting point rather than a scarce downstream prize.
Its broader legacy was to help normalize AI-generated biological models as reference resources. Later work on protein complexes, protein design, variant interpretation, and structure-aware functional annotation built on the assumption that predicted structures could be queried at proteome scale. AlphaFold DB also set an important precedent for coupling machine-learning breakthroughs with open, searchable, confidence-annotated databases, making the paradigm shift not just algorithmic but infrastructural: high-quality predicted structure became part of the common substrate of biology.
Abstract¶
Abstract The AlphaFold Protein Structure Database (AlphaFold DB, https://alphafold.ebi.ac.uk) is an openly accessible, extensive database of high-accuracy protein-structure predictions. Powered by AlphaFold v2.0 of DeepMind, it has enabled an unprecedented expansion of the structural coverage of the known protein-sequence space. AlphaFold DB provides programmatic access to and interactive visualization of predicted atomic coordinates, per-residue and pairwise model-confidence estimates and predicted aligned errors. The initial release of AlphaFold DB contains over 360,000 predicted structures across 21 model-organism proteomes, which will soon be expanded to cover most of the (over 100 million) representative sequences from the UniRef90 data set.
Related¶
- cite → Highly accurate protein structure prediction with AlphaFold — The AlphaFold database paper uses AlphaFold's high-accuracy structure-prediction method to populate predicted protein structures at proteome scale.