Skip to content

Principal components analysis corrects for stratification in genome-wide association studies

Why this mattered

Before this paper, population stratification was one of the central threats to genome-wide association studies: ancestry differences between cases and controls could make ordinary allele-frequency tests report disease associations that were really demographic history. Price et al. made stratification correction practical at GWAS scale by using principal components to infer axes of ancestry directly from dense genotype data, then adjusting association tests for those axes. The key shift was not that ancestry mattered, which was already well known, but that it could be modeled explicitly, efficiently, and marker-by-marker in studies with hundreds of thousands of SNPs.

This helped make large case-control GWAS a dependable discovery engine. Researchers could now combine broader samples, detect subtle structure even within apparently homogeneous populations such as European Americans, and separate true disease signals from ancestry-correlated artifacts with much greater confidence. The method, implemented through EIGENSTRAT and related tools, became part of the standard statistical grammar of human genetics.

Its influence extended beyond one correction procedure. PCA-based ancestry adjustment became a routine component of GWAS quality control, replication, meta-analysis, biobank-scale genetics, and later polygenic score work. In that sense, the paper helped convert GWAS from a fragile high-throughput screen into a reproducible framework for mapping common variant associations, enabling the wave of post-2007 discoveries in complex diseases and traits. Source: Nature Genetics paper.

Abstract

(no abstract available)

Sources