EM as coordinate descent in Free Energy
Think of each different setting of the hidden and visible
variables as a “configuration”. The energy of the
configuration has two terms:
The negative log prob of generating the hidden values
The negative log prob of generating the visible values
from the hidden ones
The E-step minimizes F by finding the best distribution
over hidden configurations for each data point.
The M-step holds the distribution fixed and minimizes F
by changing the parameters that determine the energy of
a configuration.