 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
| • |
Think of an RBM
as an infinite sigmoid belief net with
|
|
|
tied weights.
|
|
|
| • |
If we start at
the data, alternating Gibbs sampling
|
|
|
computes samples
from the posterior distribution in each
|
|
hidden layer of
the infinite net.
|
|
|
| • |
In deeper layers
the derivatives w.r.t. the weights are
|
|
|
very small.
|
|
|
|
– |
Contrastive
divergence just ignores these small
|
|
|
derivatives
in the deeper layers of the infinite net.
|
|
|
|
– |
Its
silly to compute the derivatives exactly when you
|
|
|
know
the weights are going to change a lot.
|
|