By using the variational bound, we can learn sigmoid belief nets
If we add bottom-up recognition connections to a generative sigmoid
belief net, we get a nice neural network model that requires a wake
phase and a sleep phase.
The activation rules and the learning rules are very simple in
both phases. This makes neuroscientists happy.
But there are problems:
The learning of the recognition weights in the sleep phase is not
quite following the gradient of the variational bound.
Even if we could follow the right gradient, the variational
approximation might be so crude that it severely limits what we
can learn.
Variational learning works because the learning tries to find regions
of the parameter space in which the variational bound is fairly tight,
even if this means getting a model that gives lower log probability to
the data.