 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
• |
Think of each
different setting of the hidden and visible
|
|
|
variables as a
“configuration”. The energy of the
|
|
|
configuration has
two terms:
|
|
|
|
– |
The
negative log prob of generating the hidden values
|
|
|
– |
The
negative log prob of generating the visible values
|
|
|
from
the hidden ones
|
|
|
• |
The E-step
minimizes F by finding the best distribution
|
|
|
over hidden
configurations for each data point.
|
|
|
• |
The M-step holds
the distribution fixed and minimizes F
|
|
|
by changing the
parameters that determine the energy of
|
|
a configuration.
|
|