














• 
Each time we
learn a new layer, the inference at the



layer below
becomes incorrect, but the variational bound


on the log prob
of the data improves.



• 
Since the bound
starts as an equality, learning a new



layer never
decreases the log prob of the data, provided



we start the
learning from the tied weights that



implement the
complementary prior.



• 
Now that we have
a guarantee we can loosen the



restrictions and
still feel confident.




– 
Allow
layers to vary in size.




– 
Do
not start the learning at each layer from the



weights
in the layer below.

