What kind of a Graphical Model
is the Brain?

Overview

Stochastic binary neurons

Two types of unsupervised neural network

Sigmoid Belief Nets

The learning rule for sigmoid belief nets

Why learning is hard in a sigmoid belief net.

How a Boltzmann Machine models data

The Energy of a joint configuration

Using energies to define probabilities

A very surprising fact

The batch learning algorithm

Four reasons why learning is impractical
in Boltzmann Machines

Restricted Boltzmann Machines

A picture of the Boltzmann machine learning algorithm for an RBM

Contrastive divergence learning:
 A quick way to learn an RBM

Using an RBM to learn a model of a digit class

The weights learned by the 100 hidden units

A surprising relationship between Boltzmann Machines and Sigmoid Belief Nets

Using complementary priors to eliminate explaining away

An example of a complementary prior

Inference in a DAG with replicated weights

The generative model

Learning by dividing and conquering

Another way to divide and conquer

"The learning rule for a..."

Pro’s and con’s of replicating the weights

Multilayer contrastive divergence

A simplified version with all hidden layers the same size

Why the hidden configurations should be treated as data when learning the next layer of weights

Why greedy learning works

Back-fitting

A neural network model of digit recognition

Samples generated by running the top-level RBM with one label clamped. There are 1000 iterations of alternating Gibbs sampling between samples.

Examples of correctly recognized MNIST test digits (the 49 closest calls)

How well does it discriminate on MNIST test set with no extra information about geometric distortions?

Slide 37

Samples generated by running top-level RBM with one label clamped. Initialized by an up-pass from a random binary image. 20 iterations between samples.

Learning with realistic labels

Learning with auditory labels

A different way to capture low-dimensional manifolds

THE  END

The wake-sleep algorithm

The flaws in the wake-sleep algorithm

The up-down algorithm:
A contrastive divergence version of wake-sleep

Mode averaging

The receptive fields of the first hidden layer

The generative fields of the first hidden layer

Independence relationships of hidden variables
 in three types of model