

















• 
Each RBM
converts its data distribution



into an
aggregated posterior distribution


over its hidden
units.



• 
This divides the
task of modeling its



data into two
tasks:




– 
Task
1: Learn generative weights



that
can convert the aggregated



posterior
distribution over the hidden


units
back into the data distribution.




– 
Task
2: Learn to model the



aggregated
posterior distribution



over
the hidden units.




– 
The
RBM does a good job of task 1



and
a moderately good job of task 2.


• 
Task 2 is easier
(for the next RBM) than


modeling the
original data because the



aggregated
posterior distribution is



closer to a
distribution that an RBM can



model perfectly.

