Mixtures of experts
Instead of training one big neural net to deal with every
training case, use several small neural nets.
Each small net is good at a particular subset of the
cases
There is a “manager” or “gating network” that decides
which small net should be used for each case.
The gating net and the expert nets are all trained at
the same time to minimize one big objective function.
This will be covered in detail in the lecture on Oct 23.
Meanwhile you could read the 1991 paper called
“Adaptive mixtures of local experts” on my webpage.