Skip to content

An integrated map of genetic variation from 1,092 human genomes

Why this mattered

This paper helped turn human genetics from a reference-genome enterprise into a population-scale map of standing variation. Earlier genome projects had established a human reference sequence and catalogued common markers, but this study showed that whole-genome and exome sequencing across 1,092 people could be integrated into a validated haplotype resource spanning tens of millions of SNPs, indels, and structural deletions. Its central shift was practical as much as conceptual: genetic variation was no longer treated mainly as isolated sites typed in selected cohorts, but as phased, population-contextualized sequence variation that could be imputed, compared, and interpreted across diverse ancestries.

What became newly possible was the routine use of large reference panels for genome-wide association studies, fine-mapping, imputation, population-genetic inference, and rare-variant discovery. By showing that low-frequency variants were geographically structured, and that purifying selection shaped their distribution especially at conserved and protein-altering sites, the paper clarified why ancestry-aware sampling and analysis were essential for disease genetics. It also widened the interpretive frame beyond protein-coding changes: each individual carried many rare non-coding variants at conserved sites, including variants predicted to disrupt regulatory motifs, making regulatory variation a concrete object for population-scale study rather than an abstract expectation.

The work directly anticipated later breakthroughs in genomic medicine and statistical genetics: larger reference panels such as later 1000 Genomes releases, Haplotype Reference Consortium resources, TOPMed, gnomAD, and national biobank sequencing projects all built on the same premise that variant interpretation depends on population frequency, haplotype structure, and functional annotation at scale. It did not by itself solve disease prediction or rare-variant interpretation, but it supplied the infrastructure that made those programs credible: a shared map against which new genomes could be compared, imputed, filtered, and biologically prioritized.

Abstract

By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methods to integrate information across several algorithms and diverse data sources, we provide a validated haplotype map of 38 million single nucleotide polymorphisms, 1.4 million short insertions and deletions, and more than 14,000 larger deletions. We show that individuals from different populations carry different profiles of rare and common variants, and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites. This resource, which captures up to 98% of accessible single nucleotide polymorphisms at a frequency of 1% in related populations, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations. This report from the 1000 Genomes Project describes the genomes of 1,092 individuals from 14 human populations, providing a resource for common and low-frequency variant analysis in individuals from diverse populations; hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites, can be found in each individual. This report by the 1000 Genomes Project describes the genomes of 1,092 individuals from 14 human populations, providing a resource for common and low-frequency variant analysis in individuals from diverse populations. Integrative analyses reveal profiles of rare and common variants in different populations. The frequencies of rare variants vary across biological pathways, and hundreds of rare, non-coding variants at conserved sites — such as changes disrupting transcription-factor motifs — can be established for each individual.

Sources