• Deep belief nets can benefit a lot from unlabeled data
when labeled data is scarce.
– They just use the labeled data for fine-tuning.
• Kernel methods, like Gaussian processes, work well on
small labeled training sets but are slow for large training
sets.
• So when there is a lot of unlabeled data and only a little
labeled data, combine the two approaches:
– First learn a deep belief net without using the labels.
– Then apply Gaussian process models to the deepest
layer of features. This works better than using the raw
data.
– Then use GP’s to get the derivatives that are back-
propagated through the deep belief net. This is a
further win. It allows GP’s to fine-tune complicated
domain-specific kernels.