Skip to content

The Protein Data Bank

Why this mattered

The 2000 Protein Data Bank paper mattered because it marked structural biology’s transition from a literature-centered field into an infrastructure-centered one. The PDB had existed since 1971, but this paper described it as a single worldwide archive with standardized deposition, validation, and public access for macromolecular structures. That changed what a “structure” meant scientifically: not just a result reported in a paper, but reusable data that could be searched, compared, reanalyzed, and built into new computational workflows.

This made large-scale structural biology possible. Researchers could systematically compare protein folds, study ligand-binding sites across families, validate new structures against prior examples, and use experimentally determined coordinates for homology modeling, rational drug design, molecular dynamics, and annotation of newly sequenced genomes. The paper’s importance was therefore not a single biological discovery, but the consolidation of a shared reference layer for molecular biology.

Many later breakthroughs depended on that layer. Structural genomics programs used the PDB as both target map and output archive; cryo-EM and other methods expanded into the same public structural ecosystem; and modern machine-learning systems for protein structure prediction, including AlphaFold-era models, relied on decades of PDB-deposited structures as training data and benchmarks. The paradigm shift was that biological structure became a cumulative, machine-readable public commons.

Abstract

The Protein Data Bank (PDB; http://www.rcsb.org/pdb/ ) is the single worldwide archive of structural data of biological macromolecules. This paper describes the goals of the PDB, the systems in place for data deposition and access, how to obtain further information, and near-term plans for the future development of the resource.

Sources