First, model the distribution of digit images
The top two layers form a restricted
Boltzmann machine whose free energy
landscape should model the low
dimensional manifolds of the digits.
2000 units
500 units
The network learns a density model for
unlabeled digit images. When we generate
from the model we get things that look like
real digits of all classes.
But do the hidden features really help with
digit discrimination?
Add 10 softmaxed units to the top and do
500 units
28 x 28