Three ways to combine probability density
models (an underlying theme of the tutorial)
• Mixture:  Take a weighted average of the distributions.
– It can never be sharper than the individual distributions.
It’s a very weak way to combine models.
• Product: Multiply the distributions at each point and then
renormalize.
– Exponentially more powerful than a mixture. The
normalization makes maximum likelihood learning
difficult, but approximations allow us to learn anyway.
• Composition: Use the values of the latent variables of one
model as the data for the next model.
– Works well for learning multiple layers of representation,
but only if the individual models are undirected.