Unsupervised “pre-training” also helps for
models that have more data and better priors
•
Ranzato et. al. (NIPS 2006) used an additional
600,000 distorted digits.
•
They also used convolutional multilayer neural
networks that have some built-in, local
translational invariance.
Back-propagation alone:
0.49%
Unsupervised layer-by-layer
pre-training followed by backprop:
0.39%
(record)