NIPS 2007 Tutorial on Deep Belief Nets


	Unsupervised “pre-training” also helps for

models that have more data and better priors


•	Ranzato et. al. (NIPS 2006) used an additional
	600,000 distorted digits.

•	They also used convolutional multilayer neural

	networks that have some built-in, local
	translational invariance.


Back-propagation alone: 0.49%

Unsupervised layer-by-layer
pre-training followed by backprop: 0.39% (record)