Machine Learning

On the Convergence of Bound Optimization Algorithms

Many practitioners who use EM and related algorithms complain that they are sometimes slow. When does this happen, and what can be done about it? In this paper, we study the general class of {\em bound optimization} algorithms -- including EM, Iterative Scaling, Non-negative Matrix Factorization, CCCP -- and their relationship to direct optimization algorithms such as gradient-based methods for parameter learning. We derive a general relationship between the updates performed by bound optimization methods and those of gradient and second-order methods and identify analytic conditions under which bound optimization algorithms exhibit quasi-Newton behavior, and conditions under which they possess poor, first-order convergence. Based on this analysis, we consider several specific algorithms, interpret and analyze their convergence properties and provide some recipes for preprocessing input to these algorithms to yield faster convergence behavior. We report empirical results supporting our analysis and showing that simple data preprocessing can result in dramatically improved performance of bound optimizers in practice.
Here is a preprint of a paper, written with Sam Roweis and Zoubin Ghahramani:
On the Convergence of Bound Optimization Algorithms .

Direct Optimization in Latent Variable Models

We show a close relationship between the Expectation - Maximization (EM) algorithm and direct optimization algorithms such as gradient-based methods for parameter learning. We identify analytic conditions under which EM exhibits Quasi-Newton behavior, and conditions under which it possesses poor, first-order convergence. Based on this analysis, we propose two novel algorithms for maximum likelihood estimation of latent variable models, and report empirical results showing that, as predicted by the theory, the proposed new algorithms can substantially outperform standard EM in terms of speed of convergence in certain cases.
Here is a preprint of a paper, written with Sam Roweis and Zoubin Ghahramani: Optimization with EM and Expectation-Conjugate-Gradient .
Here is a poster presentation at the 12th Annual Canadian Conference on Intelligent Systems, May 2002, Calgary, Canada: Is EM Really Better? .

Adaptive Overrelaxed Bound Optimization Algorithms

We study a class of overrelaxed bound optimization algorithms, and their relationship to standard bound optimizers, such as Expectation-Maximization, Iterative Scaling, CCCP and Non-Negative Matrix Factorization. We provide a theoretical analysis of the convergence properties of these optimizers and identify analytic conditions under which they are expected to outperform the standard versions. Based on this analysis, we propose a novel, simple adaptive overrelaxed scheme for practical optimization and report empirical results on several synthetic and real-world data sets showing that these new adaptive methods exhibit superior performance (in certain cases by several orders of magnitude) compared to their traditional counterparts. Our ``drop-in'' extensions are simple to implement, apply to a wide variety of algorithms, almost always give a substantial speedup, and do not require any theoretical analysis of the underlying algorithm.
Here is a preprint of a paper, written with Sam Roweis: Adaptive Overrelaxed Bound Optimization Methods .

Unsupervised Text Segmentation with Constrained HMMs

Here I am working with Sam Roweis on a model that learns to segment (or group) relevant sequential data together in completely unsupervised fasion. The model possesses hierarchical structure, but is nevertheless extremely efficient in its greedy way of performing training as well as inference. We also report preliminary empirical results that suggest that this approach to the task of text segmentation can be promising. This project is in progress.
Here is a poster presentation I gave at our ML group meeting entitled An Efficient Greedy Way to Perform Unsupervised Text Segmentation. .

Ruslan Salakhutdinov, Artificial Intelligence Group, Toronto Computer Science || www.cs.toronto.edu/~rsalakhu