This page is outdated

Before I find time to revamp this page, please see my Google Scholar page for update-to-date publication list

About Me

I did my PhD under the supervision of David J. Fleet and Aaron Hertzmann at the University of Toronto in the Computer Vision group. I am now a research team lead at Borealis AI (RBC Institute for Research). I can be reached with this email: caoy at domain name cs.toronto.edu.


Introductory machine learning course at UofT Scarborough:

CSCC11: Introduction to Machine Learning and Data Mining (Fall 2016)

Previously I TA'ed for these courses:

CSC2503: Foundations of Computer Vision (Fall 2014)

CSCC11: Introduction to Machine Learning and Data Mining (Fall 2013)

CSC108 Introduction to Computer Programming (Fall 2013)

CSC148, CSC104 (2010-2012)

My Google Scholar Page for update-to-date list

Selected Publications (not updated)

Scaling Gaussian Processes

Yanshuai Cao

Ph.D. Thesis

Adversarial Manipulation of Deep Representations

Sara Sabour*, Yanshuai Cao*, Fartash Faghri, David J. Fleet

Accepted for ICLR 2016

We show that the representation of an image in a deep neural network (DNN) can be manipulated to mimic those of other natural images, with only minor, imperceptible perturbations to the original image. Previous methods for generating adversarial images focused on image perturbations designed to produce erroneous class labels, while we concentrate on the internal layers of DNN representations. In this way our new class of adversarial images differs qualitatively from others. While the adversary is perceptually similar to one image, its internal representation appears remarkably similar to a different image, one from a different class, bearing little if any apparent similarity to the input; they appear generic and consistent with the space of natural images. This phenomenon raises questions about DNN representations, as well as the properties of natural images themselves.

Transductive Log Opinion Pool of Gaussian Process Experts

Cao, Y., Fleet, D.J.

NIPS2015 Workshop on Nonparametric Methods for Large Scale Representation Learning, Montreal, 2015.

We introduce a framework for analyzing transductive combination of Gaussian process (GP) experts, where independently trained GP experts are combined in a way that depends on test point location, in order to scale GPs to big data. The framework provides some theoretical justification for the generalized product of GP experts (gPoE-GP) which was previously shown to work well in practice but lacks theoretical basis. Based on the proposed framework, an improvement over gPoE-GP is introduced and empirically validated.

Project page

Efficient Optimization for Sparse Gaussian Process Regression © IEEE

Cao, Y., Brubaker, M., Fleet, D.J. and Hertzmann, A.

IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.37, no.12, pp.2415-2427, Dec. 1 2015.

Project page

Generalized Product of Experts for Automatic and Principled Fusion of Gaussian Process Predictions

Cao, Y., Fleet, D.J.

Modern Nonparametrics 3: Automating the Learning Pipeline workshop at NIPS, Montreal, 2014.

In this work, we propose a generalized product of experts (gPoE) framework for combining the predictions of multiple probabilistic models. We identify four desirable properties that are important for scalability, expressiveness and robustness, when learning and inferring with a combination of multiple models. Through analysis and experiments, we show that gPoE of Gaussian processes (GP) have these qualities, while no other existing combination schemes satisfy all of them at the same time. The resulting GP-gPoE is highly scalable as individual GP experts can be independently learned in parallel; very expressive as the way experts are combined depends on the input rather than fixed; the combined prediction is still a valid probabilistic model with natural interpretation; and finally robust to unreliable predictions from individual experts.

Project page

Efficient Optimization for Sparse Gaussian Process Regression

Cao, Y., Brubaker, M., Fleet, D.J. and Hertzmann, A.

Advances in Neural Information Processing Systems (NIPS), Lake Tahoe, 2013.

We propose an efficient optimization algorithm for selecting a subset of training data to induce sparsity for Gaussian process regression. The algorithm estimates an inducing set and the hyperparameters using a single objective, either the marginal likelihood or a variational free energy. The space and time complexity are linear in training set size, and the algorithm can be applied to large regression problems on discrete or continuous domains. Empirical evaluation shows state-of-art performance in discrete cases and competitive results in the continuous case.

Project page

Hobbies and other interests

I love Chinese poetry and occasionally write some myself.

I also love the arts of Muay Thai and boxing, and I used to train at WMT.