CIAR Summer School Tutorial

Lecture 1a: Mixtures of Gaussians, EM, and Variational Free Energy

Two types of density model
(with hidden configurations h)

Clustering

The k-means algorithm

Why K-means converges

Local minima

Soft k-means

Rewarding softness

The soft assignment step

"How do we find the..."

The re-fitting step

Some difficulties with soft k-means

A generative view of clustering

The mixture of Gaussians generative model

Computing responsibilities

Computing the new mixing proportions

Computing the new means

Computing the new variances

How many Gaussians do we use?

Avoiding local optima

Speeding up the fitting

Proving that EM improves the log probability of the training data

An MDL approach to clustering

How many bits must we send?

Using a Gaussian agreed distribution

What is the best variance to use?

Sending a value assuming a mixture of two equal Gaussians

The bits-back argument

Using another message to make random decisions

The general case

Free Energy

A Canadian example

What is the best distribution?

EM as coordinate descent in Free Energy

The advantage of using F to understand EM

The indecisive means algorithm

An incremental EM algorithm

Stochastic MDL using the wrong distribution over codes

A spectrum of representations