A View of the EM Algorithm that Justifies Incremental,
  Sparse, and Other Variants 
  Radford M. Neal
  Department of Statistics and 
  Department of Computer Science
  University of Toronto
  Geoffrey E. Hinton 
  Department of Computer Science
  University of Toronto
  
  Abstract
  The EM algorithm performs maximum likelihood estimation for data in
  which some variables are unobserved. We present a function that resembles negative free
  energy and show that the M step maximizes this function with respect to the model
  parameters and the E step maximizes it with respect to the distribution over the
  unobserved variables. From this perspective, it is easy to justify an incremental variant
  of the EM algorithm in which the distribution for only one of the unobserved variables is
  recalculated in each E step. This variant is shown empirically to give faster convergence
  in a mixture estimation problem. A variant of the algorithm that exploits sparse
  conditional distributions is also described, and a wide range of other variant algorithms
  are also seen to be possible.
  In M. I. Jordan (editor) Learning in Graphical Models, pp.
  355-368, Dordrecht: Kluwer Academic Publishers (1998)
  Download:  [postscript] [pdf]
  This is a revision of an earlier draft from February 1993. 
  [home page]  [publications]