An integrated map of genetic variation from 1,092 human genomes¶
Why this mattered¶
This paper helped turn human genetics from a reference-genome enterprise into a population-scale map of standing variation. Earlier genome projects had established a human reference sequence and catalogued common markers, but this study showed that whole-genome and exome sequencing across 1,092 people could be integrated into a validated haplotype resource spanning tens of millions of SNPs, indels, and structural deletions. Its central shift was practical as much as conceptual: genetic variation was no longer treated mainly as isolated sites typed in selected cohorts, but as phased, population-contextualized sequence variation that could be imputed, compared, and interpreted across diverse ancestries.
What became newly possible was the routine use of large reference panels for genome-wide association studies, fine-mapping, imputation, population-genetic inference, and rare-variant discovery. By showing that low-frequency variants were geographically structured, and that purifying selection shaped their distribution especially at conserved and protein-altering sites, the paper clarified why ancestry-aware sampling and analysis were essential for disease genetics. It also widened the interpretive frame beyond protein-coding changes: each individual carried many rare non-coding variants at conserved sites, including variants predicted to disrupt regulatory motifs, making regulatory variation a concrete object for population-scale study rather than an abstract expectation.
The work directly anticipated later breakthroughs in genomic medicine and statistical genetics: larger reference panels such as later 1000 Genomes releases, Haplotype Reference Consortium resources, TOPMed, gnomAD, and national biobank sequencing projects all built on the same premise that variant interpretation depends on population frequency, haplotype structure, and functional annotation at scale. It did not by itself solve disease prediction or rare-variant interpretation, but it supplied the infrastructure that made those programs credible: a shared map against which new genomes could be compared, imputed, filtered, and biologically prioritized.
Abstract¶
By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methods to integrate information across several algorithms and diverse data sources, we provide a validated haplotype map of 38 million single nucleotide polymorphisms, 1.4 million short insertions and deletions, and more than 14,000 larger deletions. We show that individuals from different populations carry different profiles of rare and common variants, and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites. This resource, which captures up to 98% of accessible single nucleotide polymorphisms at a frequency of 1% in related populations, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations. This report from the 1000 Genomes Project describes the genomes of 1,092 individuals from 14 human populations, providing a resource for common and low-frequency variant analysis in individuals from diverse populations; hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites, can be found in each individual. This report by the 1000 Genomes Project describes the genomes of 1,092 individuals from 14 human populations, providing a resource for common and low-frequency variant analysis in individuals from diverse populations. Integrative analyses reveal profiles of rare and common variants in different populations. The frequencies of rare variants vary across biological pathways, and hundreds of rare, non-coding variants at conserved sites — such as changes disrupting transcription-factor motifs — can be established for each individual.
Related¶
- cite → Integrated genomic analyses of ovarian carcinoma — The 1000 Genomes integrated map cites TCGA ovarian carcinoma analysis as an example of using large-scale genomic variation data to interpret cancer genomes.
- cite → A map of human genome variation from population-scale sequencing — The 2012 1000 Genomes integrated map extends the 2010 pilot map from population-scale sequencing to a larger, more comprehensive catalog of human genetic variants.
- cite ← A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping — The 3D genome map uses 1000 Genomes variation data to relate chromatin architecture to human genetic variation.
- cite ← LD Score regression distinguishes confounding from polygenicity in genome-wide association studies — LD Score regression uses 1000 Genomes reference haplotypes to estimate linkage disequilibrium scores for genome-wide association summary statistics.
- cite ← Signatures of mutational processes in human cancer — The cancer mutational-signatures paper uses population-scale genomic variation resources such as the 1000 Genomes map to distinguish somatic mutations from inherited variants.
- cite ← A global reference for human genetic variation — The 2015 global reference is the expanded 1000 Genomes reference panel built from the earlier 1,092-genome integrated variant map.