**2007 NIPS Tutorial on:
Deep Belief Nets**

**Some things you will learn
in this tutorial**

**A spectrum of machine
learning tasks**

**Historical background:
First generation neural networks**

**Second generation neural
networks (~1985)**

**What is wrong with
back-propagation?**

**Overcoming the limitations
of back-propagation**

**Stochastic binary
units
(Bernoulli variables)**

**The learning rule for
sigmoid belief nets**

**Why it is usually very hard
to learn sigmoid belief nets one
layer at a time**

**Two types of generative
neural network**

**Restricted Boltzmann
Machines
(Smolensky ,1986, called them “harmoniums”)**

**The Energy of a joint
configuration
(ignoring terms to do with biases)**

**Weights à Energies à Probabilities**

**Using energies to define
probabilities**

**A picture of the maximum
likelihood learning algorithm for an RBM**

**How to learn a set of
features that are good for reconstructing images of the digit 2**

**How well can we reconstruct
the digit images from the binary feature activations?**

**Three ways to combine
probability density models (an underlying theme of the tutorial)**

**Training a deep
network
(the main reason RBM’s are interesting)**

**The generative model after
learning 3 layers**

**Why does greedy learning
work? An aside: Averaging
factorial distributions**

**Why does greedy learning
work?**

**Why does greedy learning
work?**

**Which distributions are
factorial in a directed belief net?**

**Why does greedy learning
fail in a directed module?**

**Fine-tuning with a
contrastive version of the “wake-sleep” algorithm**

**Show the movie of the
network generating digits
(available at www.cs.toronto/~hinton)**

**Examples of correctly
recognized handwritten digits
that the neural network had never seen before**

**Unsupervised “pre-training”
also helps for models that have more data and better priors**

**Another view of why
layer-by-layer learning works**

**An infinite sigmoid belief
net that is equivalent to an RBM**

**Inference in a directed net
with replicated weights**

**Learning a deep directed
network**

**"Then freeze the first
layer..."**

**What happens when the
weights in higher layers become different from the weights in the first layer?**

**A stack of RBM’s
(Yee-Whye Teh’s idea)**

**Overview of the rest of the
tutorial**

**Fine-tuning for
discrimination**

**Why backpropagation works
better after greedy pre-training**

**First, model the
distribution of digit images**

**Results on
permutation-invariant MNIST task**

**Combining deep belief nets
with Gaussian processes**

**Learning to extract the
orientation of a face patch (Salakhutdinov & Hinton, NIPS 2007)**

**The root mean squared error
in the orientation when combining GP’s with deep belief nets**

**The free-energy of a
mean-field logistic unit**

**An RBM with real-valued
visible units**

**Deep Autoencoders
(Hinton & Salakhutdinov, 2006)**

**A comparison of methods for
compressing digit images to 30 real numbers.**

**Do the 30-D codes found by
the deep autoencoder preserve the class structure of the data?**

**Retrieving documents that
are similar to a query document**

**How to compress the count
vector**

**Performance of the
autoencoder at document retrieval**

**Proportion of retrieved
documents in same class as query**

**Finding binary codes for
documents**

**How good is a shortlist
found this way?**

**The conditional RBM model
(Sutskever & Hinton 2007)**

**Why the autoregressive
connections do not cause problems**

**Generating from a learned
model**

**An application to modeling
motion capture data
(Taylor, Roweis & Hinton, 2007)**

**Modeling multiple types of
motion**

**Show Graham Taylor’s
movies
available at www.cs.toronto/~hinton**

**Generating the parts of an
object**

**Semi-restricted Boltzmann
Machines**

**Learning a semi-restricted
Boltzmann Machine**

**Learning in Semi-restricted
Boltzmann Machines**

**Results on modeling natural
image patches using a stack of RBM’s (Osindero and Hinton)**

**Whitening the learning
signal instead of the data**

**Towards a more powerful,
multi-linear stackable learning module**

**Higher order Boltzmann
machines (Sejnowski, ~1986)**

**A picture of a
conditional,
higher-order Boltzmann machine
(Hinton & Lang,1985)**