Skip to content

Regression Models and Life-Tables

Why this mattered

Cox’s 1972 paper changed survival analysis by separating the effect of covariates from the baseline shape of risk over time. Before this, modeling censored failure-time data often required committing to a fully specified life-table or parametric failure distribution. The proportional hazards model made it possible to estimate how treatments, exposures, or patient characteristics changed the instantaneous risk of an event while leaving the underlying baseline hazard arbitrary. The key technical move was the partial likelihood, which used the ordering of observed failures to infer regression coefficients without first estimating the unknown time-dependent baseline hazard.

This made regression-style inference practical for censored time-to-event data across medicine, epidemiology, engineering, economics, and the social sciences. Researchers could now adjust for multiple explanatory variables, handle incomplete follow-up, and express results as hazard ratios in a framework that was both interpretable and flexible. The paper therefore turned survival analysis from a specialized actuarial life-table problem into a general modeling language for longitudinal risk.

Its influence also shaped later breakthroughs in semiparametric statistics and event-history modeling. The Cox model became a prototype for methods that combine finite-dimensional parameters of scientific interest with infinite-dimensional nuisance components, and it inspired extensive work on counting-process formulations, robust variance estimation, frailty models, time-varying covariates, competing risks, and causal survival methods. Much of modern clinical-trial analysis and observational risk modeling still rests on the conceptual compromise introduced here: enough structure to estimate meaningful covariate effects, but not so much structure that the baseline course of risk must be known in advance.

Abstract

Summary The analysis of censored failure times is considered. It is assumed that on each individual are available values of one or more explanatory variables. The hazard function (age-specific failure rate) is taken to be a function of the explanatory variables and unknown regression coefficients multiplied by an arbitrary and unknown function of time. A conditional likelihood is obtained, leading to inferences about the unknown regression coefficients. Some generalizations are outlined.

Sources