Three ways to combine probability density
models
Mixture:  A weighted average of the distributions.
It can never be sharper than the individual
distributions.
Product: Multiply the distributions at each point and then
renormalize.
Much more powerful than a mixture, but the
normalization can make learning difficult.
Composition: Use the values of the latent variables of
one model as the data for the next model.
Learns multiple layers of representation.
We would like to guarantee that the composite model
improves every time we add a new layer.