Project on Mixtures of Experts

Mixtures of Experts will be explained in the lecture on Oct 15. Instead of training one big neural net to deal with every training case, they use several small neural nets. Each small net is good at a particular subset of the cases. There is a "manager" or "gating network" that decides which small net should be used for each case. The gating net and the expert nets are all trained at the same time to minimize one big objective function.

The data

For this project you should find your own dataset (there are lots on the web!). What you need is a supervised learning task in which there are several different "regimes" and the mapping from input to output is fairly different in different regimes.

Proving your code works

You should start by debugging your code on a toy example that you construct yourself. You need to show that your code does the right thing on this toy example and you should describe this very briefly in your report. Once you can learn a sensible mixture of experts on the toy example, try some real data.

Designing the network

You have to decide how many experts to use and how complicated each expert should be. You also have to decide how complicated the gating network should be. This will involve some experimentation and it will be useful to have a validation set to pick the best design and training procedure.

Comparing results

In addition to training the whole system as described in the last part of the 1991 paper on mixtures of experts, you should compare it with a single feedforward network trained with backpropagation. If you are working as a pair, you should also compare with a hierarchical mixture of experts that has at least two levels of gating network.