 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
• |
What if we use an
approximation to the posterior
|
|
|
distribution over
hidden configurations?
|
|
|
|
– |
e.g.
assume the posterior factorizes into a product of
|
|
distributions
for each separate hidden cause.
|
|
|
• |
If we use the
approximation for learning, there is no
|
|
|
guarantee that
learning will increase the probability that
|
|
the model would
generate the observed data.
|
|
|
• |
But there is a
different function, called “variational
free
|
|
|
energy” that is guaranteed to improve:
|
|