7
Why learning is hard in a sigmoid belief net.
To learn W, we need the posterior
distribution in the first hidden layer.
Problem 1: The posterior is typically
intractable because of “explaining
away”.
Problem 2: The posterior depends
on the prior created by higher layers
as well as the likelihood.
So to learn W, we need to know
the weights in higher layers, even
if we are only approximating the
posterior. All the weights interact.
Problem 3: We need to integrate
over all possible configurations of
the higher variables to get the prior
for first hidden layer. Yuk!
hidden variables
hidden variables
prior
hidden variables
  likelihood
W
          data