The learning rule for sigmoid belief nets
Suppose we could “observe” the
states of all the hidden units
when the net was generating the
observed data.
E.g. Generate randomly from
the net and ignore all the
times when it does not
generate data in the training
Keep n examples of the
hidden states for each
datavector in the training set.
For each node, maximize the log
probability of its “observed” state
given the observed states of its
This minimizes the energy of
the complete configuration.