 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
|
Deep belief nets
can benefit a lot from unlabeled data
|
|
|
when labeled data
is scarce.
|
|
|
|
They
just use the labeled data for fine-tuning.
|
|
|
Kernel methods,
like Gaussian processes, work well on
|
|
|
small labeled
training sets but are slow for large training
|
|
|
sets.
|
|
|
So when there is
a lot of unlabeled data and only a little
|
|
|
labeled data,
combine the two approaches:
|
|
|
|
First
learn a deep belief net without using the labels.
|
|
|
|
Then
apply Gaussian process models to the deepest
|
|
|
layer
of features. This works better than using the raw
|
|
data.
|
|
|
|
Then
use GPs to get the derivatives that are back-
|
|
|
propagated
through the deep belief net. This is a
|
|
|
further
win. It allows GPs to fine-tune complicated
|
|
|
domain-specific
kernels.
|
|