• Deep belief nets can benefit a lot from unlabeled data
when labeled data is scarce.
– They just use the labeled data for fine-tuning.
• Kernel methods, like Gaussian processes, work well on
small labeled training sets but are very slow for large
training sets.
• So when there is a lot of unlabeled data and only a little
labeled data, combine the two approaches:
– First learn a deep belief net without using the labels.
– Then apply Gaussian process models to the deepest
layer of features. This works better than using the raw
data.
– Use GP’s to get the derivatives that are
backpropagated through the deep belief net. This is a
further win. It allows GP’s to fine-tune complicated
kernels.