Making an error function that encourages
specialization instead of cooperation
Average of all
the predictors
If we want to encourage cooperation,
we compare the average of all the
predictors with the target and train to
reduce the discrepancy.
This can overfit badly. It makes the
model much more powerful than
training each predictor separately.
If we want to encourage specialization
we compare each predictor separately
with the target and train to reduce the
average of all these discrepancies.
Its best to use a weighted average,
where the weights, p, are the
probabilities of picking that “expert”
for the particular training case.
probability of picking
expert i for this case