 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
| • |
If we are doing
regression and our loss function
|
|
|
is the squared
error, then average the outputs of
|
|
all the experts
using the probabilities of paths
|
|
|
through the
gating tree as the weights.
|
|
|
| • |
This avoids
discontinuities at boundaries
|
|
|
between regions,
because the probabilities are
|
|
|
soft.
|
|
|
|
– |
Its
like the use of sigmoid units to allow
|
|
|
gradient
descent in training feed-forward nets
|
|