A commonsense way to use limited
computational resources
First train a model on all of the data
Lets assume it get the great majority of the cases
right.
Then train another model on all the cases the first model
got wrong plus an equal number that it got right.
This focusses the resources on modelling the hard
cases.
Train a third model focusssing on cases that either or
both previous models got wrong.
Then use a simple committee of the three models
This is quite effective for learning to recognize
handwritten digits, but it is also very heuristic.
Can we give it a theoretical foundation?