the next generation of neural networks

18

Why does greedy learning work?

•

Each RBM converts its data distribution

into a posterior distribution over its

hidden units.

•

This divides the task of modeling its

data into two tasks:

–

Task 1: Learn generative weights

that can convert the posterior

distribution over the hidden units

back into the data.

–

Task 2: Learn to model the posterior

distribution over the hidden units.

–

The RBM does a good job of task 1

and a not so good job of task 2.

•

Task 2 is easier (for the next RBM) than

modeling the original data because the

posterior distribution is closer to a

distribution that an RBM can model

perfectly.

Task 2


posterior distribution
on hidden units

Task 1


data distribution
on visible units