Learning multiple layers of features greedily

A guarantee

•

Can we prove that adding more layers will always help?

–

It would be very nice if we could learn a big model

one layer at a time and guarantee that as we add

each new hidden layer the model gets better.

•

We can actually guarantee the following:

–

There is a lower bound on the log probability of the

data.

–

Provided that the layers do not get smaller and the

weights are initialized correctly (which is easy), every

time we learn a new hidden layer this bound is

improved (unless its already maximized).

•

The derivation of this guarantee is quite complicated.