IJCAI 2005 Research Excellence Award Lecture

Why greedy learning works

•

Each time we learn a new layer, the inference at the

layer below becomes incorrect, but the variational bound

on the log prob of the data improves.

•

Since the bound starts as an equality, learning a new

layer never decreases the log prob of the data, provided

we start the learning from the tied weights that

implement the complementary prior.

•

Now that we have a guarantee we can loosen the

restrictions and still feel confident.

–

Allow layers to vary in size.

–

Do not start the learning at each layer from the

weights in the layer below.