CSC2515 Fall 2007
 Introduction to Machine Learning

Lecture 5: Mixture models, EM              and variational inference

Overview

Clustering

The k-means algorithm

Why K-means converges

Local minima

Soft k-means

A generative view of clustering

The mixture of Gaussians generative model

Fitting a mixture of Gaussians

The E-step: Computing responsibilities

The M-step: Computing new mixing proportions

More M-step: Computing the new means

More M-step: Computing the new variances

How do we know that the updates improve things?

Why EM converges

The expected energy of a datapoint

The entropy term

The E-step chooses the assignment probabilities that minimize the cost function                         (with the parameters of the Gaussians held fixed)

The M-step chooses the parameters that minimize the cost function                         (with the assignment probabilities held fixed)

The advantage of using F to understand EM

An incremental EM algorithm

Beyond Mixture models:
Directed Acyclic Graphical models

Ways to define the conditional probabilities

What is easy and what is hard in a DAG?

Explaining away

An apparently crazy idea

Approximate inference

A trade-off between how well the model fits the data and the accuracy of inference

Slide 30

Two ways to derive F

An MDL approach to clustering

How many bits must we send?

Using a Gaussian agreed distribution

What is the best variance to use?

Sending a value assuming a mixture of two equal Gaussians

The bits-back argument

Using another message to make random decisions

The general case

What is the best distribution?

Free Energy

A Canadian example

EM as coordinate descent in Free Energy

Stochastic MDL using the wrong distribution over codes

How many components does a mixture need?