 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
• |
Restricted
Boltzmann Machines provide a simple way to
|
|
|
learn a layer of
features without any supervision.
|
|
|
|
– |
Maximum
likelihood learning is computationally
|
|
|
expensive
because of the normalization term, but
|
|
|
contrastive
divergence learning is fast and usually
|
|
|
works
well.
|
|
|
• |
Many layers of
representation can be learned by treating
|
|
the hidden
states of one RBM as the visible data for
|
|
|
training the next
RBM (a composition of
experts).
|
|
|
• |
This creates
good generative models that can then be
|
|
|
fine-tuned.
|
|
|
|
– |
Contrastive
wake-sleep can fine-tune generation.
|
|