 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
| |
Deep belief nets
can benefit a lot from unlabeled data
|
|
|
when labeled data
is scarce.
|
|
|
|
They
just use the labeled data for fine-tuning.
|
|
| |
Kernel methods,
like Gaussian processes, work well on
|
|
|
small labeled
training sets but are very slow for large
|
|
|
training sets.
|
|
| |
So when there is
a lot of unlabeled data and only a little
|
|
|
labeled data,
combine the two approaches:
|
|
|
|
First
learn a deep belief net without using the labels.
|
|
|
|
Then
apply Gaussian process models to the deepest
|
|
|
layer
of features. This works better than using the raw
|
|
data.
|
|
|
|
Use
GPs to get the derivatives that are
|
|
|
backpropagated
through the deep belief net. This is a
|
|
|
further
win. It allows GPs to fine-tune complicated
|
|
|
kernels.
|
|