Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles¶
Why this mattered¶
Before GSEA, genome-wide expression studies often reduced discovery to ranked lists of individually significant genes, which made biological interpretation fragile: modest but coordinated shifts across many genes could be missed. Subramanian and colleagues reframed the unit of analysis from the single gene to the biologically defined gene set, treating pathways, chromosomal regions, and regulatory programs as interpretable signals. This made it possible to ask whether a known process was systematically perturbed even when no one gene crossed a strict significance threshold.
The paper mattered because it linked high-throughput expression profiling to accumulated biological knowledge. In the lung cancer survival examples, GSEA found shared pathway-level structure across independent studies where single-gene comparisons showed little agreement, illustrating a route toward reproducible interpretation in noisy clinical genomics. Its accompanying software and curated database of 1,325 gene sets also turned the method into infrastructure, not merely a statistical proposal.
That shift helped shape the next generation of functional genomics: pathway enrichment, signature scoring, and knowledge-based interpretation became standard companions to microarrays, RNA-seq, cancer subtype analysis, perturbation screens, and later single-cell studies. The broader lesson was durable: genome-scale data become more explanatory when analyzed in terms of coordinated biological programs rather than isolated molecular measurements.
Abstract¶
Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.
Related¶
- enables → Spatial reconstruction of single-cell gene expression data — Gene set enrichment analysis links them through marker-gene pathway interpretation used to infer tissue regions from single-cell expression profiles.
- cite ← Spatial reconstruction of single-cell gene expression data — Spatial reconstruction of single-cell expression uses gene set enrichment analysis to interpret spatially patterned transcriptional programs.