lecture 7


A different (and better?) type of hierarchy for

	a mixture of experts

•

Instead of just using a hierarchy of gating nets, also use

a hierarchy of experts.

•

Learn the whole system by greedy divide-and-conquer.

–

Start by learning a single expert.

–

Then make two slightly different copies of the expert,

and use EM to rapidly fit an MoE with one gating

network and two experts.

–

Now split each of these two experts. Use the previous

gating network as the initial top-level gating net and

add two new gating nets (with zero weights) at the

next level down.