 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
| • |
Can we prove
that adding more layers will always help?
|
|
|
|
– |
It
would be very nice if we could learn a big model
|
|
|
one
layer at a time and guarantee that as we add
|
|
|
each
new hidden layer the model gets better.
|
|
|
| • |
We can actually
guarantee the following:
|
|
|
|
– |
There
is a lower bound on the log probability of the
|
|
|
data.
|
|
|
|
– |
Provided
that the layers do not get smaller and the
|
|
|
weights
are initialized correctly (which is easy), every
|
|
time
we learn a new hidden layer this bound is
|
|
|
improved
(unless its already maximized).
|
|
|
|
• |
The
derivation of this guarantee is quite complicated.
|
|