 |
 |
 |
 |
 |
 |
 |
|
After learning
the first layer of weights:
|
|
|
|
If we freeze the generative weights that define the
|
|
|
likelihood term
and the recognition weights that define
|
|
|
the distribution
over hidden configurations, we get:
|
|
|
|
Maximizing the
RHS is equivalent to maximizing the log
|
|
prob of
data that occurs with
probability
|
|