 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
• |
After we have
learned all the layers greedily, the weights in the
|
|
|
lower layers
will no longer be optimal. We can improve them in two
|
|
|
ways:
|
|
|
|
– |
Untie
the recognition weights from the generative weights and
|
|
|
learn
recognition weights that take into account the non-
|
|
|
complementary
prior implemented by the weights in higher
|
|
|
layers.
|
|
|
|
– |
Improve
the generative weights to take into account the non-
|
|
|
complementary
priors implemented by the weights in higher
|
|
|
layers.
|
|
|
• |
What algorithm
should we use for fine-tuning the weights that are
|
|
|
learned greedily?
|
|
|
|
– |
We
use a contrastive version of the “wake-sleep” algorithm. This
|
|
is
explained in the written paper. It will not be described in the
|
|
|
talk.
|
|