Learning relational dictionaries

Modelling two images with one set of hidden variables makes it possible to learn spatio-temporal features that can represent the relationship between the images.

The key ingredient to make this work is to let the three groups of variables interact multiplicatively, leading to a gated sparse coding model.

Factored gated Boltzmann machines ([Memisevic, Hinton, 2010]) implement this idea in the form of a probabilistic model with a simple Hebbian learning rule. But one can apply the approach to other sparse coding techniques.

Code

A Python implementation of a factored gated Boltzmann machine is available here (requires numpy).

An example Python script that instantiates and trains the model on pairs of shifted random images is available here (uses matplotlib for plotting).
(inputimages.txt, outputimages.txt)

(Note: The image files are approximately 40MB each. You can get both image sets in a small compressed tar (tgz) file here.)

Example 1: Learning a dictionary to represent translation

After about 200 epochs of training on the shifted random images (this can easily take an hour or so - don't get discouraged) the learned filters should look something like this:


The model has learned a 2d Fourier transform as the optimal filters for representing translations. (Filters for the input image are shown on the left, filters for the output image on the right.)

Example 2: Learning a dictionary to represent rotation

When training on rotations, the model learns a polar ("circular") generalization of the Fourier transform:



Training on scalings and other types of transformation leads to other types of transformation-specific basis functions.
After learning about transformations it is possible to make analogies, understand the meaning of "the same" and to perform transformation invariant classification [Memisevic, Hinton, 2010].



References

2010 Memisevic, R., Hinton, G.
Learning to Represent Spatial Transformations with Factored Higher-Order Boltzmann Machines.
Neural Computation June 2010, Vol. 22, No. 6: 1473-1492. [pdf], [tech-report (pdf)], [errata]
The paper introduces the model and shows some simple applications.
Basis functions that the model learned from rotations are shown on the June 2010 edition of Neural Computation (click to enlarge):


2007 Memisevic, R. and Hinton, G. E.
Unsupervised learning of image transformations.
Computer Vision and Pattern Recognition (CVPR 2007). [pdf], [tech-report (pdf)]
Describes a simpler, non-factorized version of the model and a convolutional version.

2008 Memisevic, Roland
Non-linear latent factor models for revealing structure in high-dimensional data.
Doctoral dissertation, University of Toronto. [pdf]
Describes some extensions and a non-probabilistic version of the model (among other things).