 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
• |
To learn W, we need the posterior
|
|
|
distribution in
the first hidden layer.
|
|
|
• |
Problem
1: The posterior is typically
|
|
|
intractable
because of “explaining
|
|
|
away”.
|
|
|
• |
Problem
2: The posterior depends
|
|
|
on the prior
created by higher layers
|
|
|
as well as the
likelihood.
|
|
|
|
– |
So
to learn W, we need to know
|
|
|
the
weights in higher layers, even
|
|
if
we are only approximating the
|
|
|
posterior.
All the weights interact.
|
|
|
• |
Problem
3: We need to integrate
|
|
|
over all possible
configurations of
|
|
|
the higher
variables to get the prior
|
|
|
for first hidden
layer. Yuk!
|
|