CSC2515 Fall 2007
Introduction to Machine
Learning
Lecture 8 Deep Belief Nets
Three ways to combine probability density models
The learning rule for sigmoid belief nets
Why it is usually very hard to learn sigmoid belief nets one layer at a time
Two types of generative neural network
The Energy of a joint
configuration
(ignoring terms to do with biases)
Weights à Energies à Probabilities
Using energies to define probabilities
A picture of the maximum likelihood learning algorithm for an RBM
How to learn a set of features that are good for reconstructing images of the digit 2
How well can we reconstruct the digit images from the binary feature activations?
The generative model after learning 3 layers
Why does greedy learning work?
A neural model of digit recognition
Fine-tuning with a contrastive divergence version of the “wake-sleep” algorithm
Show the movie of the
network generating digits
(available at www.cs.toronto/~hinton)
Examples of correctly
recognized handwritten digits
that the neural network had never seen before
Another view of why layer-by-layer learning works
An infinite sigmoid belief net that is equivalent to an RBM
Inference in a directed net with replicated weights
Learning a deep directed network
"Then freeze the first layer..."
What happens when the weights in higher layers become different from the weights in the first layer?
A stack of RBM’s
(Yee-Whye Teh’s picture)
Fine-tuning for discrimination
Why backpropagation works better after greedy pre-training
First, model the distribution of digit images
Results on permutation-invariant MNIST task
Combining deep belief nets with Gaussian processes
Learning to extract the orientation of a face patch (Ruslan Salakhutdinov)
The root mean squared error in the orientation when combining GP’s with deep belief nets
The free-energy of a mean-field logistic unit
An RBM with real-valued visible units
Deep Autoencoders
(Ruslan Salakhutdinov)
A comparison of methods for compressing digit images to 30 real numbers.
Do the 30-D codes found by the deep autoencoder preserve the class structure of the data?
Retrieving documents that are similar to a query document
How to compress the count vector
Performance of the autoencoder at document retrieval
Proportion of retrieved documents in same class as query
Finding binary codes for documents
Using a deep autoencoder as a hash-function for finding approximate matches
How good is a shortlist found this way?
Why the lateral connections do not cause problems
Generating from a learned model
An application to modeling
motion capture data
Modeling multiple types of motion
Generating the parts of an object
Semi-restricted Boltzmann Machines
Learning in Semi-restricted Boltzmann Machines
Learning a semi-restricted Boltzmann Machine
Results on modeling natural image patches using a stack of RBM’s (Osindero and Hinton)
Whitening the learning signal instead of the data
Higher order Boltzmann machines
A picture of a
conditional,
higher-order Boltzmann machine (1985)