LD Score regression distinguishes confounding from polygenicity in genome-wide association studies¶
Why this mattered¶
Before LD Score regression, inflation in GWAS test statistics was often treated as a warning sign: it could mean population stratification, cryptic relatedness, or technical bias, but it could also be the expected signature of a highly polygenic trait. Bulik-Sullivan and colleagues made that ambiguity measurable. Their key insight was that true polygenic signal should increase with a variant’s linkage disequilibrium burden, because high-LD variants tag more causal variation, whereas many confounding effects inflate statistics more uniformly. Regressing association statistics on LD Score therefore separated a heritable polygenic component from an intercept capturing residual confounding and bias.
This changed what could be learned from GWAS summary statistics alone. Researchers could estimate SNP heritability, assess whether a study’s signal was likely biological rather than artifactual, and compare results across consortia without requiring access to individual-level genotype data. That was especially important as GWAS moved from single-cohort designs to massive meta-analyses, where raw data sharing was often impossible and subtle stratification was a persistent concern.
The paper also helped shift complex-trait genetics away from a “significant loci only” view toward a genome-wide signal model. Its framework became a foundation for later methods that estimated genetic correlations between traits, partitioned heritability by functional annotation, and identified disease-relevant tissues and cell types. In that sense, LD Score regression did not merely improve GWAS quality control; it made summary-statistic genetics a scalable analytic paradigm, enabling many later breakthroughs in psychiatric genetics, biobank-scale trait analysis, and functional interpretation of polygenic disease risk.
Abstract¶
(no abstract available)
Related¶
- cite → GCTA: A Tool for Genome-wide Complex Trait Analysis — LD Score regression contrasts its summary-statistic heritability and confounding estimates with GCTA's individual-level GREML variance-component approach.
- cite → An integrated map of genetic variation from 1,092 human genomes — LD Score regression uses 1000 Genomes reference haplotypes to estimate linkage disequilibrium scores for genome-wide association summary statistics.
- cite → Biological insights from 108 schizophrenia-associated genetic loci — LD Score regression applies its method to schizophrenia GWAS results to show inflation from polygenic signal rather than only population confounding.
- cite → Principal components analysis corrects for stratification in genome-wide association studies — LD Score regression addresses the same population stratification problem that principal components analysis corrects in genome-wide association studies.
- cite ← Biological insights from 108 schizophrenia-associated genetic loci — LD Score regression provides a method to test whether the schizophrenia GWAS loci reflect true polygenic signal rather than population stratification or confounding.
- enables ← GCTA: A Tool for Genome-wide Complex Trait Analysis — GCTA's genome-wide variance-component modeling of polygenic signal enabled LD Score regression's distinction between true polygenicity and confounding.
- enables ← Principal components analysis corrects for stratification in genome-wide association studies — PCA correction for population stratification defined the GWAS confounding problem that LD Score regression sought to separate from polygenic inheritance.
Sources¶
- DOI: https://doi.org/10.1038/ng.3211
- OpenAlex: https://openalex.org/W2153860431