Modelling the distribution of digit images
The top two layers form a restricted
Boltzmann machine whose free energy
landscape should model the low
dimensional manifolds of the digits.
2000 units
The network learns a density model for
unlabeled digit images. When we generate
from the model we often get things that look
like real digits of all classes.
More hidden layers make the generated
fantasies look better (YW Teh, Simon Osindero).
But do the hidden features really help with
digit discrimination? Add 10 softmaxed
units to the top and do backprop.
28 x 28
pixel
image