NIPS 2007 Tutorial on Deep Belief Nets

The variational bound


Each time we replace the prior over the hidden units by a better
prior, we win by the difference in the probability assigned


Now we cancel out all of the partition functions except the top one
and replace log probabilities by goodnesses using the fact that:


This has simple derivatives that give a more justifiable
fine-tuning algorithm than contrastive wake-sleep.