CSC321: Neural
Networks
Lecture 15: Mixtures of Experts
Partitioning based on input alone versus partitioning based on input-output relationship
A picture of why averaging is bad
Making an error function that encourages specialization instead of cooperation
The mixture of experts architecture
The derivatives of the simple cost function
Another view of mixtures of experts
Giving a whole distribution as output
The probability distribution that is implicitly assumed when using squared error
The probability of the correct answer under a mixture of Gaussians
A natural error measure for a Mixture of Experts
A picture of two imaginary vowels and a mixture of two linear experts after learning