A View of the EM Algorithm that Justifies Incremental,
Sparse, and Other Variants
Radford M. Neal
Department of Statistics and
Department of Computer Science
University of Toronto
Geoffrey E. Hinton
Department of Computer Science
University of Toronto
Abstract
The EM algorithm performs maximum likelihood estimation for data in
which some variables are unobserved. We present a function that resembles negative free
energy and show that the M step maximizes this function with respect to the model
parameters and the E step maximizes it with respect to the distribution over the
unobserved variables. From this perspective, it is easy to justify an incremental variant
of the EM algorithm in which the distribution for only one of the unobserved variables is
recalculated in each E step. This variant is shown empirically to give faster convergence
in a mixture estimation problem. A variant of the algorithm that exploits sparse
conditional distributions is also described, and a wide range of other variant algorithms
are also seen to be possible.
In M. I. Jordan (editor) Learning in Graphical Models, pp.
355-368, Dordrecht: Kluwer Academic Publishers (1998)
Download: [postscript] [pdf]
This is a revision of an earlier draft from February 1993.
[home page] [publications]