lec15

Mixtures of Experts

•

Can we do better that just averaging predictors in a way

that does not depend on the particular training case?

–

Maybe we can look at the input data for a particular

case to help us decide which model to rely on.

•

This may allow particular models to specialize in a subset of

the training cases. They do not learn on cases for which they

are not picked. So they can ignore stuff they are not good at

modeling.

•

The key idea is to make each expert focus on predicting

the right answer for the cases where it is already doing

better than the other experts.

–

This causes specialization.

–

If we always average all the predictors, each model is

trying to compensate for the combined error made by

all the other models.