
















• 
First train a
model on all of the data



– 
Lets
assume it get the great majority of the cases



right.


• 
Then train
another model on all the cases the first model


got wrong plus
an equal number that it got right.



– 
This
focusses the resources on modelling the hard



cases.


• 
Train a third
model focusssing on cases that either or



both previous
models got wrong.



– 
Then
use a simple committee of the three models


• 
This is quite
effective for learning to recognize



handwritten
digits, but it is also very heuristic.



– 
Can
we give it a theoretical foundation?

