 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
• |
First train a
layer of features that receive input directly
|
|
|
from the pixels.
|
|
|
• |
Then treat the
activations of the trained features as if
|
|
|
they were pixels
and learn features of features in a
|
|
|
second hidden
layer.
|
|
|
• |
It can be proved
that each time we add another layer of
|
|
|
features we
improve a variational lower bound on the log
|
|
probability of
the training data.
|
|
|
|
– |
The
proof is slightly complicated.
|
|
|
|
– |
But
it is based on a neat equivalence between an
|
|
|
RBM
and a deep directed model (described later)
|
|