Faster mixing chains
• Hybrid Monte Carlo can only take small steps because
the energy surface is curved.
• With a single layer of hidden units, it is possible to use
alternating parallel Gibbs sampling.
– Step 1: each student-t hidden unit picks a variance
from the posterior distribution over variances given
the violation produced by the current datavector. If the
violation is big, it picks a big variance
• This is equivalent to picking a Gaussian from an infinite
mixture of Gaussians (because that’s what a student-t is).
– With the variances fixed, each hidden unit defines a
one-dimensional Gaussians in the dataspace.
– Step 2: pick a visible vector from the product of all the
one-dimensional Gaussians.