Summary so far
Restricted Boltzmann Machines provide a simple way to
learn a layer of features without any supervision.
Maximum likelihood learning is computationally
expensive because of the normalization term, but
contrastive divergence learning is fast and usually
works well.
Many layers of representation can be learned by treating
the hidden states of one RBM as the visible data for
training the next RBM (a composition of experts).
This creates good generative models that can then be
fine-tuned.
Contrastive wake-sleep can fine-tune generation.