Fantope Regularization in Metric Learning

Marc T. Law, Nicolas Thome, Matthieu Cord

Sorbonne Universités, UPMC Univ Paris 06, UMR 7606, LIP6, F-75005, Paris, France

[pdf] [Matlab code] [Matlab Code for LFW (165 MB)]

Abstract


This paper introduces a regularization method to explicitly control the rank of a learned symmetric positive semidefinite distance matrix in distance metric learning. To this end, we propose to incorporate in the objective function a linear regularization term that minimizes the k smallest eigenvalues of the distance matrix. It is equivalent to minimizing the trace of the product of the distance matrix with a matrix in the convex hull of rank-k projection matrices, called a Fantope. Based on this new regularization method, we derive an optimization scheme to efficiently learn the distance matrix. We demonstrate the effectiveness of the method on synthetic and challenging real datasets of face verification and image classification with relative attributes, on which our method outperforms state-of-the-art metric learning algorithms.

Approach

The goal is to minimize a problem:
min μ R(M) + loss(M) subject to M is a symmetric positive semidefinite matrix              (1)

where μ is a nonnegative regularization parameter, loss(M) is a loss function that is convex in M and R(M) is a regularization term which corresponds to the sum of the k smallest eigenvalues of M in our paper.
We can rewrite R(M) = tr(WM) where W is the orthogonal projector on the eigenvectors of M with k smallest eigenvalues.
We give the Matlab code to efficiently compute such a W.

Matlab Code

Computing the sum of the k smallest eigenvalues

The regularization framework that we propose is general and can be applied in any problem that learns a symmetric PSD matrix M. We describe here how to compute the matrix W such that tr(WM) is the sum of the k smallest eigenvalues.
Let the variable dim be the dimensionality of the dim x dim dimensional symmetric PSD matrix M of rank smaller than or equal to e:
dim = 50; e = 10; k = dim-e;
L = rand(e,dim);
M = L'*L;


To compute the matrix W such that tr(WM) is the sum of the k smallest eigenvalues of M:
function [W] = compute_W_from_M(M,k)
    [V,D] = eig(M);
    [~,b] = sort(diag(D),'ascend');
    V_W = V(:,b(1:k));
    W = V_W * V_W';
end


The sum of the k smallest eigenvalues of M is then:
W = compute_W_from_M(M,k);
sum(sum(M.*W))

Optimizing the problem in Eq.(1) over M

Let nb_iterations be the number of iterations of the algorithm. In order to avoid 2 eigendecompositions per iteration, one can use the following method:

function [G] = gradient_of_loss(M) G = rand(size(M)) - rand(size(M)); end;
W = computer_k_smallest_eigenvalues(M,k); mu = 10; step = 0.01;

for iter=1:nb_iterations
    Gradient = mu * W + gradient_of_loss(M);
    M = M - step * Gradient;
    [V,D] = eig(M)
    % projection onto the PSD cone:
    D(D<=0) = 0;
    M = V * D * V';
    % compute W:
    [~,b] = sort(diag(D));
    V_W = V(:,b(1:k));
    W = V_W * V_W';
end


A Matlab code is provided here for an experiment on a toy dataset and here for an experiment on the Labeled faces in the Wild (LFW) dataset. We include the (square-rooted) features provided by Matthieu Guillaumin in our zip archive.

Paper

[1] Fantope Regularization in Metric Learning, M.T. Law, N. Thome, M. Cord, CVPR 2014 [pdf] [Matlab code] [Matlab Code for LFW]
Last modified 26-2-2015