CSC2535 Spring 2008

Lecture 6: Recent developments in Deep Belief Nets

Training a deep network
(a quick overview)

Discriminative fine-tuning

Using backpropagation for fine-tuning

Generative fine-tuning

Contrastive divergence learning for RBM’s
(the simple and not very satisfactory story)

When should we use mean-field approximations?

A justification for using the real-valued hidden probabilities of one RBM as data for the next RBM

Some questions about CD learning

Using discriminative performance as an indirect measure of density

Using free-energies of two models for discrimination

Estimating the partition function

Annealed Importance Sampling

Creating the sequence of distributions for an RBM

The details of annealed importance sampling will be explained in the tutorial

Which version of CD learning works best for density modeling?

A remote analogy

An improved version of CD

Persistent CD (Tijmen Tieleman)

Contrastive divergence as an adversarial game

The objective function for persistent CD

Contrastive divergence: the old story

How persistent CD moves between the modes of the model’s distribution

Show Tijmen’s demonstrations of the behaviour of persistent CD for learning very simple models

A political analogy to persistent CD
(just to help you remember the idea)

Full Boltzmann machine learning in multi-layer networks

Optimizing the variational bound for a deep belief net

A picture of the fine-tuning procedure

Why this fine-tuning procedure is neat

Some top-down effects in perception

Figuring out the derivatives of the variational bound

The two different derivatives of the variational bound

A stack of RBM’s
(Yee-Whye Teh’s picture)

Some notation

Two expressions for G(v)

The variational bound

Differentiating the bound

Derivatives of G

Slide 39

The derivatives of the bound due to changes in G

The derivatives via Q

Expected changes in energy caused by changing the probability of turning on a unit

Combining the via  Q derivatives from the higher and lower RBM’s

Back-propagating the derivatives that come from changing Q