Skip to content

LD Score regression distinguishes confounding from polygenicity in genome-wide association studies

Why this mattered

Before LD Score regression, inflation in GWAS test statistics was often treated as a warning sign: it could mean population stratification, cryptic relatedness, or technical bias, but it could also be the expected signature of a highly polygenic trait. Bulik-Sullivan and colleagues made that ambiguity measurable. Their key insight was that true polygenic signal should increase with a variant’s linkage disequilibrium burden, because high-LD variants tag more causal variation, whereas many confounding effects inflate statistics more uniformly. Regressing association statistics on LD Score therefore separated a heritable polygenic component from an intercept capturing residual confounding and bias.

This changed what could be learned from GWAS summary statistics alone. Researchers could estimate SNP heritability, assess whether a study’s signal was likely biological rather than artifactual, and compare results across consortia without requiring access to individual-level genotype data. That was especially important as GWAS moved from single-cohort designs to massive meta-analyses, where raw data sharing was often impossible and subtle stratification was a persistent concern.

The paper also helped shift complex-trait genetics away from a “significant loci only” view toward a genome-wide signal model. Its framework became a foundation for later methods that estimated genetic correlations between traits, partitioned heritability by functional annotation, and identified disease-relevant tissues and cell types. In that sense, LD Score regression did not merely improve GWAS quality control; it made summary-statistic genetics a scalable analytic paradigm, enabling many later breakthroughs in psychiatric genetics, biobank-scale trait analysis, and functional interpretation of polygenic disease risk.

Abstract

(no abstract available)

Sources