NIPS 2007 Tutorial on Deep Belief Nets

Why does greedy learning work?

•

Each RBM converts its data distribution

into an aggregated posterior distribution

over its hidden units.

•

This divides the task of modeling its

data into two tasks:

–

Task 1: Learn generative weights

that can convert the aggregated

posterior distribution over the hidden

units back into the data distribution.

–

Task 2: Learn to model the

aggregated posterior distribution

over the hidden units.

–

The RBM does a good job of task 1

and a moderately good job of task 2.

•

Task 2 is easier (for the next RBM) than

modeling the original data because the

aggregated posterior distribution is

closer to a distribution that an RBM can

model perfectly.

Task 2


aggregated
posterior distribution
on hidden units

Task 1


data distribution
on visible units