 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
|
|
Why does layer-by-layer learning work?
|
|
|
|
|
 |
 |
The weights,
W, in the bottom level RBM define
p(v|h)
|
and they also, indirectly,
define p(h).
|
|
|
So we can express
the RBM model as
|
|
|
|
|
|
 |
|
|
 |
conditional
|
probability
|
|
|
|
|
 |
joint
|
|
probability
|
|
|
|
|
 |
index over
all
|
|
hidden
vectors
|
|
|
|
|
|
|
|
|
|
|
|
 |
 |
 |
If we leave
p(v|h) alone and build a better model of p(h),
|
we will improve p(v).
|
|
|
We need a better
model of the posterior hidden vectors
|
|
produced by
applying W to the data.
|
|
|
|
|