lecture 5

Approximate inference

•

For models that use distributed non-linear

representations, it is intractable to compute the exact

posterior distribution over hidden configurations. So what

happens if we use a tractable approximation to the

posterior?

–

e.g. assume the posterior over hidden configurations

for each datavector factorizes into a product of

distributions for each separate hidden cause.

•

If we use this approximation for learning, there is no

guarantee that learning will increase the probability that

the model would generate the observed data.

•

But maybe we can find a different and sensible objective

function that is guaranteed to improve at each update of

the parameters.