Skip to content

Statistical Aspects of the Analysis of Data From Retrospective Studies of Disease

Why this mattered

Mantel and Haenszel’s 1959 paper helped make the retrospective case-control study a rigorous inferential design rather than a merely suggestive way to collect disease histories. Its central shift was to treat misleading associations as a statistical design problem: inappropriate control groups and uncontrolled third factors could generate apparent disease-exposure links, but matching and subclassification could reduce that risk if the analysis was changed accordingly. The paper supplied that analysis, giving a stratified chi-square test and a summary relative-risk estimator that combined evidence across controlled subcategories instead of collapsing heterogeneous data into a single crude comparison.

What became newly possible was credible inference from observational disease data when randomized or prospective studies were impractical, slow, or unethical. Investigators could compare diseased and non-diseased groups while explicitly adjusting for age, sex, site, or other confounders, and could report both statistical significance and an interpretable measure of association. This was especially important for cancer epidemiology: the paper’s examples from pulmonary carcinoma sit in the same historical moment in which smoking, occupational exposures, and other chronic-disease risks were being established through nonrandomized evidence.

The subsequent breakthrough was methodological standardization. The Mantel-Haenszel test and estimator became foundational tools in epidemiology, biostatistics, clinical research, and later meta-analysis, where the same logic of combining stratum-specific associations under controlled heterogeneity reappeared. Modern logistic regression, conditional likelihood methods for matched case-control studies, causal adjustment, and pooled analyses use more general machinery, but they inherit the paper’s core paradigm: observational associations are not self-interpreting; they become scientifically useful only when the design and analysis explicitly confront confounding and preserve the structure of comparison.

Abstract

The role and limitations of retrospective investigations of factors possibly associated with the occurrence of a disease are discussed and their relationship to forward-type studies emphasized. Examples of situations in which misleading associations could arise through the use of inappropriate control groups are presented. The possibility of misleading associations may be minimized by controlling or matching on factors which could produce such associations; the statistical analysis will then be modified. Statistical methodology is presented for analyzing retrospective study data, including chi-square measures of statistical significance of the observed association between the disease and the factor under study, and measures for interpreting the association in terms of an increased relative risk of disease. An extension of the chi-square test to the situation where data are subclassified by factors controlled in the analysis is given. A summary relative risk formula, R, is presented and discussed in connection with the problem of weighting the individual subcategory relative risks according to their importance or their precision. Alternative relative-risk formulas, R1, R2, R3, and R4, which require the calculation of subcategory-adjusted proportions of the study factor among diseased persons and controls for the computation of relative risks, are discussed. While these latter formulas may be useful in many instances, they may be biased or inconsistent and are not, in fact, averages of the relative risks observed in the separate subcategories. Only the relative-risk formula, R, of those presented, can be viewed as such an average. The relationship of the matched-sample method to the sub-classification approach is indicated. The statistical methodology presented is illustrated with examples from a study of women with epidermoid and undifferentiated pulmonary carcinoma.

  • enablesRegression Models and Life-Tables — Mantel and Haenszel's retrospective disease analysis formalized stratified risk estimation, a precursor to Cox's regression framework for censored survival data.
  • citeRegression Models and Life-Tables — Cox's proportional hazards model generalizes retrospective disease-risk regression ideas to censored survival and life-table data.

Sources