Making dumb backpropagation work really
well for recognizing digits
Using the standard viewing transformations plus
local deformation fields to get LOTS of data.
Use a single hidden layer with very small initial
weights:
it needs to break symmetry very slowly to find
a good local minimum
Use a more appropriate error measure for multi-
class categorization.