Skip to content

Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

Why this mattered

Benjamini and Hochberg changed the default question in large-scale testing from “How can we avoid even one false positive?” to “How can we keep the fraction of false positives among reported findings acceptably small?” That distinction was decisive. Familywise error rate control, typified by Bonferroni-style procedures, was appropriate for settings where any false rejection was costly, but it became increasingly ill-suited to experiments testing hundreds, thousands, or millions of hypotheses. By formalizing the false discovery rate as the expected proportion of false rejections among all rejections, the paper supplied an error criterion that matched discovery-oriented science more closely: some false leads could be tolerated if the overall list of findings remained statistically disciplined.

The practical force of the paper lay in pairing that new criterion with a simple step-up procedure that was more powerful than conventional familywise-error methods while still giving a provable guarantee under independence. This made multiple testing usable at the scale demanded by genomics, neuroimaging, high-throughput biology, astronomy, economics, and modern machine learning evaluation. Researchers could now produce ranked sets of candidate genes, associations, voxels, signals, or features without either ignoring multiplicity or erasing most true effects through overly conservative correction.

Its influence also came from creating a language for later work. Subsequent developments extended FDR control to dependent tests, adaptive procedures, empirical Bayes methods, local false discovery rates, q-values, and large-scale selective inference. Those advances did not merely refine a technical correction; they helped define the statistical infrastructure of the high-dimensional sciences. The 1995 paper mattered because it made mass discovery statistically legitimate: it replaced an error standard designed for scarcity of hypotheses with one suited to an era in which scientific instruments routinely generate vast numbers of simultaneous questions.

Abstract

SUMMARY The common approach to the multiplicity problem calls for controlling the familywise error rate (FWER). This approach, though, has faults, and we point out a few. A different approach to problems of multiple significance testing is presented. It calls for controlling the expected proportion of falsely rejected hypotheses — the false discovery rate. This error rate is equivalent to the FWER when all hypotheses are true but is smaller otherwise. Therefore, in problems where the control of the false discovery rate rather than that of the FWER is desired, there is potential for a gain in power. A simple sequential Bonferronitype procedure is proved to control the false discovery rate for independent test statistics, and a simulation study shows that the gain in power is substantial. The use of the new procedure and the appropriateness of the criterion are illustrated with examples.

Sources