A brute force approach
LeNet uses knowledge about the invariances to design:
 the network architecture
or the weight constraints
or the types of feature
But its much simpler to incorporate knowledge of
invariances by just creating extra training data:
for each training image, produce new training data by
applying all of the transformations we want to be
insensitive to (Le Net can benefit from this too)
Then train a large, dumb net on a fast computer.
This works surprisingly well if the transformations are
not too big (so do approximate normalization first).