
















• 
To learn W, we need the posterior



distribution in
the first hidden layer.



• 
Problem
1: The posterior is typically



intractable
because of “explaining



away”.



• 
Problem
2: The posterior depends



on the prior as
well as the likelihood.



– 
So
to learn W, we need to know



the
weights in higher layers, even



if
we are only approximating the



posterior.
All the weights interact.



• 
Problem
3: We need to integrate



over all possible
configurations of



the higher
variables to get the prior



for first hidden
layer. Yuk!

