lecture 7


Making predictions once the tree has been

	learned

•

If we are doing regression and our loss function

is the squared error, then average the outputs of

all the experts using the probabilities of paths

through the gating tree as the weights.

•

This avoids discontinuities at boundaries

between regions, because the probabilities are

soft.

–

Its like the use of sigmoid units to allow

gradient descent in training feed-forward nets