CSC321: Neural Networks

Lecture 15: Mixtures of Experts

A spectrum of models

Multiple local models

Partitioning based on input alone versus partitioning based on input-output relationship

Mixtures of Experts

A picture of why averaging is bad

Making an error function that encourages specialization instead of cooperation

The mixture of experts architecture

The derivatives of the simple cost function

Another view of mixtures of experts

Giving a whole distribution as output

The probability distribution that is implicitly assumed when using squared error

The probability of the correct answer under a mixture of Gaussians

A natural error measure for a Mixture of Experts

What are vowels?

A picture of two imaginary vowels and a mixture of two linear experts after learning