Principal components analysis corrects for stratification in genome-wide association studies¶

Why this mattered¶

Before this paper, population stratification was one of the central threats to genome-wide association studies: ancestry differences between cases and controls could make ordinary allele-frequency tests report disease associations that were really demographic history. Price et al. made stratification correction practical at GWAS scale by using principal components to infer axes of ancestry directly from dense genotype data, then adjusting association tests for those axes. The key shift was not that ancestry mattered, which was already well known, but that it could be modeled explicitly, efficiently, and marker-by-marker in studies with hundreds of thousands of SNPs.

This helped make large case-control GWAS a dependable discovery engine. Researchers could now combine broader samples, detect subtle structure even within apparently homogeneous populations such as European Americans, and separate true disease signals from ancestry-correlated artifacts with much greater confidence. The method, implemented through EIGENSTRAT and related tools, became part of the standard statistical grammar of human genetics.

Its influence extended beyond one correction procedure. PCA-based ancestry adjustment became a routine component of GWAS quality control, replication, meta-analysis, biobank-scale genetics, and later polygenic score work. In that sense, the paper helped convert GWAS from a fragile high-throughput screen into a reproducible framework for mapping common variant associations, enabling the wave of post-2007 discoveries in complex diseases and traits. Source: Nature Genetics paper.

Abstract¶

(no abstract available)

cite → Inference of Population Structure Using Multilocus Genotype Data — The PCA GWAS paper uses population-structure ideas established by STRUCTURE to correct ancestry stratification in association studies.
enables → LD Score regression distinguishes confounding from polygenicity in genome-wide association studies — PCA correction for population stratification defined the GWAS confounding problem that LD Score regression sought to separate from polygenic inheritance.
enables → The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans — PCA correction for population stratification enabled GTEx to adjust genotype-expression association tests for ancestry-related confounding.
cite ← GCTA: A Tool for Genome-wide Complex Trait Analysis — GCTA cites principal-components correction as a standard way to control population stratification in genome-wide association analyses.
cite ← LD Score regression distinguishes confounding from polygenicity in genome-wide association studies — LD Score regression addresses the same population stratification problem that principal components analysis corrects in genome-wide association studies.
cite ← PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses — PLINK cites EIGENSTRAT/PCA correction as a method for controlling population stratification in genome-wide association studies.
cite ← The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans — The GTEx pilot uses principal components to account for population structure and other latent confounders in association analyses.
enables ← Inference of Population Structure Using Multilocus Genotype Data — STRUCTURE modeled hidden population subgroups from multilocus genotypes, motivating PCA as a faster correction for stratification in GWAS.

Sources¶

DOI: https://doi.org/10.1038/ng1847
OpenAlex: https://openalex.org/W2157752701

Principal components analysis corrects for stratification in genome-wide association studies¶

Why this mattered¶

Abstract¶

Related¶

Sources¶