Research Interest

I am interested in computer vision, machine learning, and Bayesian inference. At the moment, I am focusing on scaling Gaussian processes with applications to computer vision systems. I am also fascinated about understanding the geometry and topology of high dimensional machine learning problems. For more of my recent musings, check out my research blog.

About Me

I'm a Ph.D. student in the computer vision group of the department of Computer Science at University of Toronto, supervised by David Fleet. You can reach me via email: caoy at cs dot toronto dot edu, or find me in Pratt-263.


Currently I'm teaching the introductory machine learning course at UofT Scarborough:

CSCC11: Introduction to Machine Learning and Data Mining (Fall 2016)

Previously I TA'ed for these courses:

CSC2503: Foundations of Computer Vision (Fall 2014)

CSCC11: Introduction to Machine Learning and Data Mining (Fall 2013)

CSC108 Introduction to Computer Programming (Fall 2013)

CSC148, CSC104 (2010-2012)


Adversarial Manipulation of Deep Representations

Sara Sabour*, Yanshuai Cao*, Fartash Faghri, David J. Fleet

Accepted for ICLR 2016

We show that the representation of an image in a deep neural network (DNN) can be manipulated to mimic those of other natural images, with only minor, imperceptible perturbations to the original image. Previous methods for generating adversarial images focused on image perturbations designed to produce erroneous class labels, while we concentrate on the internal layers of DNN representations. In this way our new class of adversarial images differs qualitatively from others. While the adversary is perceptually similar to one image, its internal representation appears remarkably similar to a different image, one from a different class, bearing little if any apparent similarity to the input; they appear generic and consistent with the space of natural images. This phenomenon raises questions about DNN representations, as well as the properties of natural images themselves.

Transductive Log Opinion Pool of Gaussian Process Experts

Cao, Y., Fleet, D.J.

NIPS2015 Workshop on Nonparametric Methods for Large Scale Representation Learning, Montreal, 2015.

We introduce a framework for analyzing transductive combination of Gaussian process (GP) experts, where independently trained GP experts are combined in a way that depends on test point location, in order to scale GPs to big data. The framework provides some theoretical justification for the generalized product of GP experts (gPoE-GP) which was previously shown to work well in practice but lacks theoretical basis. Based on the proposed framework, an improvement over gPoE-GP is introduced and empirically validated.

Project page

Efficient Optimization for Sparse Gaussian Process Regression © IEEE

Cao, Y., Brubaker, M., Fleet, D.J. and Hertzmann, A.

IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.37, no.12, pp.2415-2427, Dec. 1 2015.

Project page

Generalized Product of Experts for Automatic and Principled Fusion of Gaussian Process Predictions

Cao, Y., Fleet, D.J.

Modern Nonparametrics 3: Automating the Learning Pipeline workshop at NIPS, Montreal, 2014.

In this work, we propose a generalized product of experts (gPoE) framework for combining the predictions of multiple probabilistic models. We identify four desirable properties that are important for scalability, expressiveness and robustness, when learning and inferring with a combination of multiple models. Through analysis and experiments, we show that gPoE of Gaussian processes (GP) have these qualities, while no other existing combination schemes satisfy all of them at the same time. The resulting GP-gPoE is highly scalable as individual GP experts can be independently learned in parallel; very expressive as the way experts are combined depends on the input rather than fixed; the combined prediction is still a valid probabilistic model with natural interpretation; and finally robust to unreliable predictions from individual experts.

Project page

Efficient Optimization for Sparse Gaussian Process Regression

Cao, Y., Brubaker, M., Fleet, D.J. and Hertzmann, A.

Advances in Neural Information Processing Systems (NIPS), Lake Tahoe, 2013.

We propose an efficient optimization algorithm for selecting a subset of training data to induce sparsity for Gaussian process regression. The algorithm estimates an inducing set and the hyperparameters using a single objective, either the marginal likelihood or a variational free energy. The space and time complexity are linear in training set size, and the algorithm can be applied to large regression problems on discrete or continuous domains. Empirical evaluation shows state-of-art performance in discrete cases and competitive results in the continuous case.

Project page

Course Projects

Sequential Decision Making For Vision Tracking

Final project for CSC2534: Decision Making Under Uncertainty (Winter 2013)

In this project, we propose a formulation of vision tracking under sequential the decision making frameworks of Markov decision process (MDP) and partially observable Markov decision process (POMDP). From a very simple basic tracker, we build an interactive tracking algorithm that is capable of robustly track targets over long sequences, while still being as autonomous as it can be. We study the effect of uncertainty of state information, and the impact of active observation gathering versus passive observation under the framework.

A Bayesian Approach to Modeling Social Conversation Groups

Final project for CSC2541: Topics in Machine Learning: Bayesian Methods for Machine Learning (Winter 2011)

Constraints in social interaction is valuable prior information for detecting or tracking people in computer vision. This project proposes a probabilistic model of spatial relations among people in social events where conversation groups form. The model is based on very intuitive principles behind how people socialize, and it can reproduce a number of the observed phenomena. By using MCMC, one can infer the true position and orientation of people as well as conversation group memberships given observation. Finally, a framework in which the model could facilitate tracking multiple interacting people is discussed.

An Appearance Model Based On Transforming Autoencoders For Vision Tracking

Final project for CSC2535: Advanced Machine Learning (Winter 2011)

In this project, transforming autoencoder is used to build an appearance model that captures parts and spatial relationships among parts of target objects for vision tracking. With a very limited motion model, and basic particle filtering, the resulting tracker is able to track target MNIST digit in the presence of distractors with different or similar appearance.

Hobbies and other interests

I love Chinese poetry and occasionally write some myself.

I also love the arts of Muay Thai and boxing, and I used to train at WMT.