A framework for variation discovery and genotyping using next-generation DNA sequencing data¶
Why this mattered¶
DePristo et al. mattered because it turned next-generation sequencing variant discovery from a collection of ad hoc read-filtering and SNP-calling steps into a general, reproducible statistical framework. The paper introduced the Genome Analysis Toolkit (GATK) approach for mapping reads, recalibrating base quality scores, locally realigning around indels, discovering candidate variants, and jointly genotyping samples. Its central shift was to treat variant calling as an integrated inference problem over sequencing evidence, not simply as counting mismatches against a reference genome.
This made large-scale human resequencing practically comparable across studies. After this work, researchers could analyze exomes and whole genomes with standardized pipelines that produced calibrated variant calls suitable for population genetics, Mendelian disease studies, and cancer genomics. The framework was especially important because sequencing output was growing faster than manual interpretation methods could handle; GATK helped make thousands and then hundreds of thousands of genomes analyzable under shared assumptions and quality metrics.
Its influence is visible in later genomic resources and clinical workflows that depended on reliable variant catalogs, including large cohort projects such as 1000 Genomes, ExAC, gnomAD, and many rare-disease sequencing programs. By making variant discovery scalable and statistically disciplined, the paper helped establish the computational substrate for modern precision medicine: not the biological discovery of variation itself, but the ability to call, compare, filter, and interpret variation at population scale.
Abstract¶
(no abstract available)
Related¶
- cite → Fast and accurate short read alignment with Burrows–Wheeler transform — The GATK framework depends on BWA's Burrows-Wheeler short-read alignment to map sequencing reads before variant discovery.
- cite → The Sequence Alignment/Map format and SAMtools — The GATK framework uses the SAM/BAM alignment format and SAMtools ecosystem as core infrastructure for storing and processing sequencing reads.
- cite → Ultrafast and memory-efficient alignment of short DNA sequences to the human genome — The GATK framework cites Bowtie as an alternative ultrafast short-read aligner for producing mapped reads used in downstream genotyping.
- cite → A map of human genome variation from population-scale sequencing — The GATK framework was motivated and validated by population-scale variant discovery efforts such as the 1000 Genomes human variation map.
- enables → Analysis of protein-coding genetic variation in 60,706 humans — GATK's variant discovery and genotyping framework enabled ExAC to call and aggregate protein-coding variants across tens of thousands of exomes.
- enables → Minimap2: pairwise alignment for nucleotide sequences — GATK's variant-discovery workflows depend on accurate read-to-reference alignment, a core capability later accelerated and generalized by Minimap2's seed-chain-align method.
- cite ← Mutational landscape determines sensitivity to PD-1 blockade in non–small cell lung cancer — The NSCLC PD-1 study relies on GATK-style variant discovery and genotyping for next-generation sequencing mutation calls.
- cite ← Analysis of protein-coding genetic variation in 60,706 humans — ExAC variant calls rely on GATK-style next-generation sequencing discovery and genotyping methods.
- cite ← The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans — The GTEx pilot depends on GATK-based variant discovery and genotyping for processing next-generation sequencing data.
- cite ← Minimap2: pairwise alignment for nucleotide sequences — Minimap2 is positioned upstream of variant-discovery workflows such as GATK by providing read alignments used for genotyping from sequencing data.
Sources¶
- DOI: https://doi.org/10.1038/ng.806
- OpenAlex: https://openalex.org/W2168133698