A guarantee
Can we prove that adding more layers will always help?
It would be very nice if we could learn a big model
one layer at a time and guarantee that as we add
each new hidden layer the model gets better.
We can actually guarantee the following:
There is a lower bound on the log probability of the
data.
Provided that the layers do not get smaller and the
weights are initialized correctly (which is easy), every
time we learn a new hidden layer this bound is
improved (unless its already maximized).
The derivation of this guarantee is quite complicated.