Dimensionality Reduction @ Toronto

 
 

The aim of dimensionality reduction is to construct a low-dimensional representation of high-dimensional input data in such a way, that (important parts) of the structure or geometry of the input data are preserved. The aim of metric learning is to learn a metric in the high-dimensional space in such a way, as to minimize the generalization error of nearest neighbor classifiers trained on the high-dimensional data. Metric learning and dimensionality reduction are intimately related, as learning a Mahalanobis metric is identical to learning a linear subspace of the data.

The three main dimensionality reduction techniques on which the Toronto group is working are LLE, autoencoders, and (t-)SNE. The main metric learning technique on which we work is NCA.

Dimensionality Reduction and Metric Learning

In LLE, data is viewed as lying on or near a low-dimensional manifold that is embedded in the high-dimensional space. LLE assumes local linearity of this data manifold, and describes each high-dimensional datapoint as a linear combination W of its k nearest neighbors. Subsequently, it constructs a low-dimensional data representation in such a way, that the low-dimensional points can be reconstructed as good as possible using the reconstruction weights W. More information can be found in the following publications:


  1. Sam T. Roweis and Lawrence K. Saul. Nonlinear Dimensionality Reduction by Locally Linear Embedding. 2000. Science, 290(5500):2323-2326.


The initial work on LLE has been performed by Sam and Lawrence before Sam came to Toronto. A separate website on LLE is available here.

Locally Linear Embedding (LLE)

Autoencoders are neural networks in which one of the hidden layers has a small number of hidden units. The number of input and output units is equal (viz. the data dimensionality). The neural network is trained to minimize the difference between the input and the output of the network, i.e., it is trained to reconstruct the input data. This leads to a low-dimensional data representation in the narrow hidden layer. To circumvent poor local minima in the backpropagation, the autoencoder is greedily pretrained using a stack of RBMs. More information can be found in the following publication:


  1. Geoffrey E. Hinton and Ruslan R. Salakhutdinov. Reducing the Dimensionality of Data with Neural Networks. 2006. Science, 313(5786):504-507.


A separate website on autoencoders is available here.

Autoencoders

SNE and its successors UNI-SNE and t-SNE measure the similarity of high-dimensional points using a Gaussian kernel, and renormalize to obtain probabilities over all pairs of points that reflect the similarities between those points. In the low-dimensional space, similar probabilitie are defined, and the points are configured in such a way as to minimize the natural divergence between the two distributions. The variants of SNE differ in the kernel they use in the low-dimensional space: the Student-t kernel that is used in t-SNE appears to be very successful. More information on (t-)SNE can be found in the following publications:


  1. Geoffrey E. Hinton and Sam T. Roweis. Stochastic Neighbor Embedding. 2003. In Advances in Neural Information Processing Systems 15 (NIPS '02), pages 857-864.

  2. Laurens J.P. van der Maaten and Geoffrey E. Hinton. Visualizing High-Dimensional Data Using t-SNE. 2008. Journal of Machine Learning Research, 9:2579-2605.


A separate website on t-SNE is available here.

Stochastic Neighbor Embedding (SNE)

NCA is very similar to SNE in that it measures the similarity of the data points using a Gaussian kernel, and renormalizes. In contrast to SNE, NCA aims to learn a Mahalanobis metric (which is identical to learning a linear subspace) in which the sum of the measured similarities between points of the same class is maximized. A later variant of NCA called MCML reformulates the NCA objective function to be a convex function. More information on NCA and MCML can be found in the following publications:


  1. Amir Globerson and Sam Roweis. Metric Learning by Collapsing Classes. 2006. In Advances in Neural Information Processing Systems 18 (NIPS '05), pages 451-458.

  2. Jacob Goldberger, Sam T. Roweis, Geoffrey E. Hinton and Ruslan R. Salakhutdinov. Neighbourhood Components Analysis. 2005. In Advances in Neural Information Processing Systems 17 (NIPS ’04), pages 513-520.

Neighborhood Components Analysis (NCA)