A global reference for human genetic variation¶
Why this mattered¶
This paper mattered because it turned human genetic variation from a sparse, marker-based map into a broadly usable, population-scale reference. By sequencing and genotyping 2,504 individuals from 26 populations and phasing more than 88 million variants onto haplotypes, the 1000 Genomes Project made it possible to treat common human variation as a shared public coordinate system. Before this, genome-wide association studies often relied on limited SNP arrays and incomplete catalogs; after it, researchers could impute untyped variants, compare signals across ancestries, and localize association peaks with much higher resolution.
The paradigm shift was not only scale, but accessibility. The project established that low-coverage whole-genome sequencing, exome sequencing, and dense genotyping could be combined to produce a reliable reference panel for millions of variants across many populations. This made rare and population-specific variation visible in a systematic way while confirming that most common variants are widely shared. As a result, disease studies could move beyond testing only directly genotyped markers and begin asking whether nearby unobserved variants, haplotypes, or ancestry-linked patterns better explained association signals.
Its influence is visible in later genomic medicine and population genetics infrastructure. The 1000 Genomes reference panel became a foundation for genotype imputation, fine mapping, variant filtering in rare-disease studies, ancestry-aware association analysis, and benchmarking of sequencing pipelines. Later resources such as gnomAD, TOPMed, UK Biobank-scale sequencing efforts, and diverse national biobank projects extended the same basic logic: large, open or semi-open population reference datasets make individual genomes interpretable. The paper therefore helped shift human genetics from cataloging isolated variants toward using global reference variation as a practical engine for discovery.
Abstract¶
The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies. Results for the final phase of the 1000 Genomes Project are presented including whole-genome sequencing, targeted exome sequencing, and genotyping on high-density SNP arrays for 2,504 individuals across 26 populations, providing a global reference data set to support biomedical genetics. The 1000 Genomes Project has sought to comprehensively catalogue human genetic variation across populations, providing a valuable public genomic resource. The data obtained so far have found applications ranging from association studies and fine mapping studies to the filtering of likely neutral variants in rare-disease cohorts. The authors now report on the final phase of the project, phase 3, which covers previously uncharacterized areas of human genetic diversity in terms of the populations sampled and categories of characterized variation. The sample now includes more than 2,500 individuals from 26 global populations, with low coverage whole-genome and deep exome sequencing, as well as dense microarray genotyping. They find that while most common variants are shared across populations, rarer variants are often restricted to closely related populations. The authors also demonstrate the use of the phase 3 dataset as a reference panel for imputation to improve the resolution in genetic association studies.
Related¶
- cite → An integrated map of genetic variation from 1,092 human genomes — The 2015 global reference is the expanded 1000 Genomes reference panel built from the earlier 1,092-genome integrated variant map.
- cite → A map of human genome variation from population-scale sequencing — The 2015 global reference extends the 2010 1000 Genomes pilot map from population-scale sequencing into a larger catalog of human variation.
- cite → An integrated encyclopedia of DNA elements in the human genome — The 2015 variation reference relates to ENCODE by using functional genome annotations to interpret noncoding human genetic variants.
- cite ← The UK Biobank resource with deep phenotyping and genomic data — UK Biobank cites the 1000 Genomes global variant reference as a baseline resource for interpreting human genomic variation.
- cite ← Analysis of protein-coding genetic variation in 60,706 humans — ExAC complements the 1000 Genomes global variation map by providing deeper exome-scale catalogs of rare protein-coding variants.
- enables ← A map of human genome variation from population-scale sequencing — The 2010 1000 Genomes pilot established population-scale sequencing and variant-catalog methods that enabled the 2015 global reference haplotype map.