Back-fitting
After we have learned all the layers greedily, the weights
in the lower layers will no longer be optimal. We can
improve them in two ways:
Untie the recognition weights from the generative
weights and learn recognition weights that take into
account the non-complementary prior implemented by
the weights in higher layers.
Improve the generative weights to take into account
the non-complementary priors implemented by the
weights in higher layers.
What algorithm should we use for improving on the
weights that are learned greedily?