
















Mixture: Take
a weighted average of the distributions.





It
can never be sharper than the individual distributions.



Its
a very weak way to combine models.




Product: Multiply the distributions at each point and
then



renormalize.





Exponentially more powerful than a mixture. The



normalization
makes maximum likelihood learning



difficult,
but approximations allow us to learn anyway.




Composition: Use the values of the latent variables of
one


model as the data
for the next model.





Works
well for learning multiple layers of representation,



but
only if the individual models are undirected.

