Differential expression analysis for sequence count data¶
Why this mattered¶
TBD
Abstract¶
High-throughput sequencing assays such as RNA-Seq, ChIP-Seq or barcode counting provide quantitative readouts in the form of count data. To infer differential signal in such data correctly and with good statistical power, estimation of data variability throughout the dynamic range and a suitable error model are required. We propose a method based on the negative binomial distribution, with variance and mean linked by local regression and present an implementation, DESeq, as an R/Bioconductor package.
Related¶
- cite → Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments — DESeq cites limma's empirical Bayes differential-expression framework as a precedent while replacing microarray normal models with count-based modeling.
- cite → Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing — DESeq uses Benjamini-Hochberg false discovery rate control to adjust many gene-level differential-expression tests.
- cite → Ultrafast and memory-efficient alignment of short DNA sequences to the human genome — DESeq cites Bowtie because short-read RNA-seq counts depend on ultrafast alignment of sequencing reads to a reference genome.
- cite → Mapping and quantifying mammalian transcriptomes by RNA-Seq — DESeq builds on RNA-Seq transcriptome mapping by providing statistical differential-expression tests for the read-count data it produces.
- cite → Bioconductor: open software development for computational biology and bioinformatics — DESeq is implemented within the Bioconductor ecosystem for open-source computational biology and bioinformatics workflows.
- enables → The GTEx Consortium atlas of genetic regulatory effects across human tissues — DESeq's count-based differential expression modeling enabled GTEx analyses of RNA-seq gene expression variation across tissues and samples.
- cite ← The GTEx Consortium atlas of genetic regulatory effects across human tissues — The GTEx atlas cites DESeq as a statistical method for modeling RNA-seq count data in differential expression analyses.
- enables ← Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments — limma's empirical Bayes shrinkage of gene-wise variance estimates enabled DESeq's analogous moderation of dispersion estimates for differential expression.
- enables ← Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing — Benjamini-Hochberg false discovery rate control enabled DESeq to report genome-wide differential expression calls with adjusted multiple-testing significance.
- enables ← Bioconductor: open software development for computational biology and bioinformatics — Bioconductor enabled DESeq by providing the R-based package ecosystem, data structures, and distribution channel for reproducible bioinformatics software.