













• 
Think of each
different setting of the hidden and visible



variables as a
“configuration”. The energy of the



configuration has
two terms:




– 
The
negative log prob of generating the hidden values



– 
The
negative log prob of generating the visible values



from
the hidden ones



• 
The Estep
minimizes F by finding the best distribution



over hidden
configurations for each data point.



• 
The Mstep holds
the distribution fixed and minimizes F



by changing the
parameters that determine the energy of


a configuration.

