 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
| • |
Instead of just
using a hierarchy of gating nets, also use
|
|
|
a hierarchy of
experts.
|
|
|
| • |
Learn the whole
system by greedy divide-and-conquer.
|
|
|
|
– |
Start
by learning a single expert.
|
|
|
|
– |
Then
make two slightly different copies of the expert,
|
|
|
and
use EM to rapidly fit an MoE with one gating
|
|
|
network
and two experts.
|
|
|
|
– |
Now
split each of these two experts. Use the previous
|
|
gating
network as the initial top-level gating net and
|
|
|
add
two new gating nets (with zero weights) at the
|
|
|
next
level down.
|
|