 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
| • |
Can we do better
that just averaging predictors in a way
|
|
|
that does not
depend on the particular training case?
|
|
|
– |
Maybe
we can look at the input data for a particular
|
|
|
case
to help us decide which model to rely on.
|
|
|
• |
This
may allow particular models to specialize in a subset of
|
|
|
the
training cases. They do not learn on cases for which they
|
|
|
are
not picked. So they can ignore stuff they are not good at
|
|
|
modeling.
|
|
|
| • |
The key idea is
to make each expert focus on predicting
|
|
|
the right answer
for the cases where it is already doing
|
|
|
better than the
other experts.
|
|
|
– |
This
causes specialization.
|
|
|
– |
If
we always average all the predictors, each model is
|
|
trying
to compensate for the combined error made by
|
|
|
all
the other models.
|
|