 |
 |
 |
 |
 |
 |
 |
 |
|
|
Why does greedy learning work?
|
|
|
|
|
 |
 |
The weights,
W, in the bottom level RBM define
|
p(v|h) and they
also, indirectly, define p(h).
|
|
|
So we can express the RBM
model as
|
|
|
|
|
|
 |
|
|
|
 |
 |
 |
 |
If we leave
p(v|h) alone and improve p(h), we will
|
|
improve p(v).
|
|
|
To improve p(h),
we need it to be a better model of
|
the aggregated posterior distribution over hidden
|
|
vectors produced
by applying W to the data.
|
|
|
|
|