

















• 
To learn W, we need the posterior



distribution in
the first hidden layer.



• 
Problem
1: The posterior is typically



intractable
because of “explaining



away”.



• 
Problem
2: The posterior depends



on the prior
created by higher layers



as well as the
likelihood.




– 
So
to learn W, we need to know



the
weights in higher layers, even


if
we are only approximating the



posterior.
All the weights interact.



• 
Problem
3: We need to integrate



over all possible
configurations of



the higher
variables to get the prior



for first hidden
layer. Yuk!

