NIPS 2007 Tutorial on Deep Belief Nets

Why does greedy learning work?


The weights, W, in the bottom level RBM define
p(v\|h) and they also, indirectly, define p(h).

So we can express the RBM model as


If we leave p(v\|h) alone and improve p(h), we will
improve p(v).

To improve p(h), we need it to be a better model of
the aggregated posterior distribution over hidden
vectors produced by applying W to the data.