**Abstract**

It is possible to combine multiple probabilistic models of the same
data by multiplying the probabilities together and then renormalizing. This is a very
efficient way to model high-dimensional data which simultaneously satisfies many different
low-dimensional constraints because each individual expert model can focus on giving high
probability to data vectors that satisfy just one of the constraints. Data vectors
that satisfy this one constraint but violate other constraints will be ruled out by their
low probability under the other expert models. Training a product of experts appears
difficult because, in addition to maximizing the probabilities that each individual expert
assigns to the observed data, it is necessary to make the experts be as different as
possible. This ensures that the product of their distributions is small which allows
the renormalization to magnify the probability of the data under the product of experts
model. Fortunately, if the individual experts are tractable there is a
fairly efficient way to train a product of experts.

Download: [ps.gz] or [pdf]

[home page] [publications]