Training a deep network
(a quick overview)
First train a layer of features that receive input
directly from the pixels.
This is done by training a restricted
Boltzmann machine.
Then treat the activations of the trained features
as if they were pixels and learn features of
features in a second hidden layer.
Each time we add another layer of features we
improve a variational bound on how well we are
modeling the set of training images.
This assumes that the layers do not get
smaller and they are initialized correctly.