lecture 7

Learning a simplified HMoE

•

There is a very efficient way to learn an HMoE if we

make two assumptions:

•

Linear experts: make every expert give an output that is

a linear function of the input, and use a squared error.

–

This makes it possible to fit an expert non-iteratively if

we know how much responsibility it has for each

training case.

•

Generalized linear gating networks: make each expert

be a softmax applied to a linear transformation of the

input vector.

–

This makes it possible to fit each gating network

quickly if we know what probabilities it should output

for each case. The cost function is convex.

–

The fitting uses IRLS – iterative recursive least

squares.