CSC2535: Advanced Machine Learning
 
Lecture 11
 Learning by maximizing agreement between outputs

The aims of unsupervised learning

Temporally invariant properties

Learning temporal invariances

Some obvious measures of agreement

A new way to get a teaching signal

Mutual information

Some advantages of mutual information

A problem

Simple forms for the relationship

Learning temporal invariances

Spatially invariant properties

Maximizing mutual information between a local region and a larger context

How well does it work?

But what about discontinuities?

A simple mixture approach

Mixtures of expert interpolators

The mixture of interpolators net

Mutual Information with multi-dimensional output

Optimizing non-linear transformations to maximize mutual information between multi-dimensional outputs

Beware of Gaussian assumptions

Violating the Gaussian Assumption
(experiments by Russ Salakhutdinov)

A lucky escape

Kernel Canonical Correlation
 (Bach and Jordan)

Slow Feature Analysis
(Berges & Wiskott, Wiskott & Sejnowski)

The SFA objective function

The slow features

Slide 28

Slide 29

Slide 30

Relationship to linear dynamical system

A way to learn non-linear transformations that maximize agreement between the outputs of two modules

An energy-based model of agreement

It’s the same cost as symmetric SNE!

The forces acting on the output vectors

Combining symmetric SNE with a feedforward neural net

Slide 37

Slide 38

Slide 39

A non-probabilistic version

Neighborhood Components Analysis

An objective function for NCA

Non-linear NCA