lec2b

Why its hard to learn one layer at a time

•

To learn W, we need the posterior

distribution in the first hidden layer.

•

Problem 1: The posterior is typically

intractable because of “explaining

away”.

•

Problem 2: The posterior depends

on the prior as well as the likelihood.

–

So to learn W, we need to know

the weights in higher layers, even

if we are only approximating the

posterior. All the weights interact.

•

Problem 3: We need to integrate

over all possible configurations of

the higher variables to get the prior

for first hidden layer. Yuk!

hidden variables

hidden variables

prior

hidden variables

W

data