The learning rule for sigmoid belief nets
Suppose we could “observe” the
states of all the hidden units
when the net was generating the
observed data.
E.g. Generate randomly from
the net and ignore all the
times when it does not
generate data in the training
set.
Keep n examples of the
hidden states for each
datavector in the training set.
For each node, maximize the log
probability of its “observed” state
given the observed states of its
parents.
This minimizes the energy of
the complete configuration.
j
i