CSC2535 Spring
2008
Lecture 6: Recent developments in Deep Belief Nets
Training a deep network
(a quick overview)
Using backpropagation for fine-tuning
Contrastive divergence
learning for RBM’s
(the simple and not very satisfactory story)
When should we use mean-field approximations?
A justification for using the real-valued hidden probabilities of one RBM as data for the next RBM
Some questions about CD learning
Using discriminative performance as an indirect measure of density
Using free-energies of two models for discrimination
Estimating the partition function
Creating the sequence of distributions for an RBM
The details of annealed importance sampling will be explained in the tutorial
Which version of CD learning works best for density modeling?
Persistent CD (Tijmen Tieleman)
Contrastive divergence as an adversarial game
The objective function for persistent CD
Contrastive divergence: the old story
How persistent CD moves between the modes of the model’s distribution
Show Tijmen’s demonstrations of the behaviour of persistent CD for learning very simple models
A political analogy to
persistent CD
(just to help you remember the idea)
Full Boltzmann machine learning in multi-layer networks
Optimizing the variational bound for a deep belief net
A picture of the fine-tuning procedure
Why this fine-tuning procedure is neat
Some top-down effects in perception
Figuring out the derivatives of the variational bound
The two different derivatives of the variational bound
A stack of RBM’s
(Yee-Whye Teh’s picture)
The derivatives of the bound due to changes in G
Expected changes in energy caused by changing the probability of turning on a unit
Combining the via Q derivatives from the higher and lower RBM’s