A learning algorithm for cortex

Approximate inference

•

What if we use an approximation to the posterior

distribution over hidden configurations?

–

e.g. assume the posterior factorizes into a product of

distributions for each separate hidden cause.

•

If we use the approximation for learning, there is no

guarantee that learning will increase the probability that

the model would generate the observed data.

•

But there is a different function, called “variational free

energy” that is guaranteed to improve: