Maximum Likelihood from Incomplete Data Via the EM Algorithm¶
Why this mattered¶
Before Dempster, Laird, and Rubin, many likelihood problems with missing, censored, latent, grouped, or otherwise incomplete data were treated as separate computational tricks. This paper unified them under a single principle: alternate between estimating the expected complete-data log likelihood and maximizing it. The key shift was not merely a new optimization routine, but a way to make latent-variable likelihoods operational. Problems that had seemed analytically awkward or computationally fragmented could now be expressed as instances of one general algorithm with a monotone likelihood property.
That unification made maximum-likelihood estimation practical across fields where the quantities of interest are only indirectly observed. Finite mixture models, factor analysis, variance components, censored-data models, missing-data analysis, and empirical Bayes or hyperparameter estimation all gained a common computational language. The paper showed that “incomplete data” was not a special nuisance case but a broad structural pattern: if a simpler complete-data model could be imagined, estimation could often proceed by iteratively filling in its sufficient information in expectation.
Its influence is visible in much of later statistical machine learning. EM became a standard engine for mixture modeling, hidden Markov models, probabilistic clustering, latent-class analysis, and many graphical-model methods, and it helped normalize the idea that inference and optimization could be interleaved around unobserved variables. Later variational inference, expectation propagation, and modern latent-variable learning methods do not simply replace EM, but they inherit its central viewpoint: difficult observed-data likelihoods can often be attacked by introducing latent structure and optimizing a tractable surrogate.
Abstract¶
Summary A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived. Many examples are sketched, including missing value situations, applications to grouped, censored or truncated data, finite mixture models, variance component estimation, hyperparameter estimation, iteratively reweighted least squares and factor analysis.
Related¶
- enables → RELION: Implementation of a Bayesian approach to cryo-EM structure determination — The EM algorithm enables RELION through iterative maximum-likelihood estimation of hidden particle orientations and class assignments in cryo-EM.
- enables → Meta-analysis in clinical trials — The EM algorithm enabled clinical-trial meta-analysis by providing a likelihood framework for estimating pooled effects with incomplete or latent study-level information.
- enables → Learning the parts of objects by non-negative matrix factorization — The EM algorithm supplied the iterative latent-variable estimation template used in probabilistic interpretations and multiplicative updates for non-negative matrix factorization.
- cite ← Evolutionary trees from DNA sequences: A maximum likelihood approach — Felsenstein’s phylogenetic likelihood method cites EM as a general maximum-likelihood strategy for models with unobserved latent variables.
- cite ← RELION: Implementation of a Bayesian approach to cryo-EM structure determination — RELION adapts the EM algorithm's maximum-likelihood treatment of incomplete data to infer cryo-EM orientations, classes, and reconstructions from noisy particle images.
- cite ← Meta-analysis in clinical trials — The EM algorithm provides maximum-likelihood estimation machinery for handling incomplete or latent data structures relevant to meta-analytic clinical trial models.
- cite ← Learning the parts of objects by non-negative matrix factorization — Lee and Seung use the EM algorithm as the optimization precedent for iterative maximum-likelihood-style updates in non-negative matrix factorization.