The Protein Data Bank¶
Why this mattered¶
The 2000 Protein Data Bank paper mattered because it marked structural biology’s transition from a literature-centered field into an infrastructure-centered one. The PDB had existed since 1971, but this paper described it as a single worldwide archive with standardized deposition, validation, and public access for macromolecular structures. That changed what a “structure” meant scientifically: not just a result reported in a paper, but reusable data that could be searched, compared, reanalyzed, and built into new computational workflows.
This made large-scale structural biology possible. Researchers could systematically compare protein folds, study ligand-binding sites across families, validate new structures against prior examples, and use experimentally determined coordinates for homology modeling, rational drug design, molecular dynamics, and annotation of newly sequenced genomes. The paper’s importance was therefore not a single biological discovery, but the consolidation of a shared reference layer for molecular biology.
Many later breakthroughs depended on that layer. Structural genomics programs used the PDB as both target map and output archive; cryo-EM and other methods expanded into the same public structural ecosystem; and modern machine-learning systems for protein structure prediction, including AlphaFold-era models, relied on decades of PDB-deposited structures as training data and benchmarks. The paradigm shift was that biological structure became a cumulative, machine-readable public commons.
Abstract¶
The Protein Data Bank (PDB; http://www.rcsb.org/pdb/ ) is the single worldwide archive of structural data of biological macromolecules. This paper describes the goals of the PDB, the systems in place for data deposition and access, how to obtain further information, and near-term plans for the future development of the resource.
Related¶
- cite → Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features — The Protein Data Bank uses DSSP secondary-structure assignments derived from hydrogen-bonding and geometric criteria.
- cite → Improved tools for biological sequence comparison. — The Protein Data Bank cites FASTA-style sequence comparison as a tool for finding related biological sequences.
- cite → Basic local alignment search tool — The Protein Data Bank cites BLAST as a standard method for local sequence similarity searches against protein structures.
- enables → The FAIR Guiding Principles for scientific data management and stewardship — The Protein Data Bank exemplified reusable standardized scientific data infrastructure, enabling FAIR's later principles for findable and interoperable data stewardship.
- cite ← The FAIR Guiding Principles for scientific data management and stewardship — The FAIR principles cite the Protein Data Bank as a domain repository exemplifying persistent, reusable, machine-accessible scientific data stewardship.
- enables ← Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features — DSSP enables the Protein Data Bank by standardizing secondary-structure annotation from deposited three-dimensional protein coordinates.
- enables ← Improved tools for biological sequence comparison. — Improved sequence-comparison tools enable the Protein Data Bank by supporting protein sequence alignment and annotation for structural entries.
- enables ← Basic local alignment search tool — BLAST enables the Protein Data Bank by making rapid sequence similarity search a core way to identify and relate protein structures.