lec1a

EM as coordinate descent in Free Energy

•

Think of each different setting of the hidden and visible

variables as a “configuration”. The energy of the

configuration has two terms:

–

The negative log prob of generating the hidden values

–

The negative log prob of generating the visible values

from the hidden ones

•

The E-step minimizes F by finding the best distribution

over hidden configurations for each data point.

•

The M-step holds the distribution fixed and minimizes F

by changing the parameters that determine the energy of

a configuration.