Abstract
  It is possible to combine multiple probabilistic models of the same
  data by multiplying the probabilities together and then renormalizing. This is a very
  efficient way to model high-dimensional data which simultaneously satisfies many different
  low-dimensional constraints because each individual expert model can focus on giving high
  probability to data vectors that satisfy just one of the constraints.  Data vectors
  that satisfy this one constraint but violate other constraints will be ruled out by their
  low probability under the other expert models. Training a product of experts appears
  difficult because, in addition to maximizing the probabilities that each individual expert
  assigns  to the observed data, it is necessary to make the experts be as different as
  possible.  This ensures that the product of their distributions is small which allows
  the renormalization to magnify the probability of the data under the product of experts
  model.   Fortunately, if the individual experts  are tractable there is a
  fairly efficient way to train a product of experts. 
  Download:  [ps.gz] or [pdf]
  [home page]  [publications]