Project on Mixtures of Experts
Mixtures of Experts will be explained in the lecture on Oct
15. Instead of training one big neural net to deal with every training
case, they use several small neural nets. Each small net is good at a
particular subset of the cases. There is a "manager" or "gating
network" that decides which small net should be used for each case.
The gating net and the expert nets are all trained at the same time to
minimize one big objective function.
The data
For this project you should find your own dataset (there are lots on
the web!). What you need is a supervised learning task in which there
are several different "regimes" and the mapping from input to output is
fairly different in different regimes.
Proving your code works
You should start by debugging your code on a toy example that
you construct yourself. You need to show that your code does the right
thing on this toy example and you should describe this very briefly in
your report. Once you can learn a sensible mixture of experts on the
toy example, try some real data.
Designing the network
You have to decide how many experts to use and how complicated each
expert should be. You also have to decide how complicated the gating
network should be. This will involve some experimentation and it will
be useful to have a validation set to pick the best design and
training procedure.
Comparing results
In addition to training the whole system as described in the last part
of the 1991 paper on mixtures of experts, you should compare it with a
single feedforward network trained with backpropagation. If you
are working as a pair, you should also compare with a hierarchical
mixture of experts that has at least two levels of gating network.