yanshuai_cao.jpg

Research Interest

My long-term research goal is about developing systems that can learn good generic representations of the world given limited, weak, or no explicit supervision, while still being able to quickly adapt to new tasks with sample efficiency comparable to humans. Currently, I'm exploring adversarial machine learning as tools for this goal, both the adversarial perturbation phenomenon and adversarial training à la GAN. Previously, I worked on scaling Gaussian processes for my PhD because they are sample efficient but computationally burdensome. For more of my old musings, check out my research blog.


About Me

I did my PhD under the supervision of David J. Fleet and Aaron Hertzmann at the University of Toronto in the Computer Vision group. I am now a research team lead at Borealis AI (RBC Institute for Research). I can be reached with this email: caoy at domain name cs.toronto.edu.


Teaching

Introductory machine learning course at UofT Scarborough:

CSCC11: Introduction to Machine Learning and Data Mining (Fall 2016)

Previously I TA'ed for these courses:

CSC2503: Foundations of Computer Vision (Fall 2014)

CSCC11: Introduction to Machine Learning and Data Mining (Fall 2013)

CSC108 Introduction to Computer Programming (Fall 2013)

CSC148, CSC104 (2010-2012)

My Google Scholar Page for update-to-date list

Selected Publications (not updated)

Adversarial Manipulation of Deep Representations

Sara Sabour*, Yanshuai Cao*, Fartash Faghri, David J. Fleet

Accepted for ICLR 2016

We show that the representation of an image in a deep neural network (DNN) can be manipulated to mimic those of other natural images, with only minor, imperceptible perturbations to the original image. Previous methods for generating adversarial images focused on image perturbations designed to produce erroneous class labels, while we concentrate on the internal layers of DNN representations. In this way our new class of adversarial images differs qualitatively from others. While the adversary is perceptually similar to one image, its internal representation appears remarkably similar to a different image, one from a different class, bearing little if any apparent similarity to the input; they appear generic and consistent with the space of natural images. This phenomenon raises questions about DNN representations, as well as the properties of natural images themselves.


Transductive Log Opinion Pool of Gaussian Process Experts

Cao, Y., Fleet, D.J.

NIPS2015 Workshop on Nonparametric Methods for Large Scale Representation Learning, Montreal, 2015.

We introduce a framework for analyzing transductive combination of Gaussian process (GP) experts, where independently trained GP experts are combined in a way that depends on test point location, in order to scale GPs to big data. The framework provides some theoretical justification for the generalized product of GP experts (gPoE-GP) which was previously shown to work well in practice but lacks theoretical basis. Based on the proposed framework, an improvement over gPoE-GP is introduced and empirically validated.

Project page


Efficient Optimization for Sparse Gaussian Process Regression © IEEE

Cao, Y., Brubaker, M., Fleet, D.J. and Hertzmann, A.

IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.37, no.12, pp.2415-2427, Dec. 1 2015.

Project page


Generalized Product of Experts for Automatic and Principled Fusion of Gaussian Process Predictions

Cao, Y., Fleet, D.J.

Modern Nonparametrics 3: Automating the Learning Pipeline workshop at NIPS, Montreal, 2014.

In this work, we propose a generalized product of experts (gPoE) framework for combining the predictions of multiple probabilistic models. We identify four desirable properties that are important for scalability, expressiveness and robustness, when learning and inferring with a combination of multiple models. Through analysis and experiments, we show that gPoE of Gaussian processes (GP) have these qualities, while no other existing combination schemes satisfy all of them at the same time. The resulting GP-gPoE is highly scalable as individual GP experts can be independently learned in parallel; very expressive as the way experts are combined depends on the input rather than fixed; the combined prediction is still a valid probabilistic model with natural interpretation; and finally robust to unreliable predictions from individual experts.

Project page


Efficient Optimization for Sparse Gaussian Process Regression

Cao, Y., Brubaker, M., Fleet, D.J. and Hertzmann, A.

Advances in Neural Information Processing Systems (NIPS), Lake Tahoe, 2013.

We propose an efficient optimization algorithm for selecting a subset of training data to induce sparsity for Gaussian process regression. The algorithm estimates an inducing set and the hyperparameters using a single objective, either the marginal likelihood or a variational free energy. The space and time complexity are linear in training set size, and the algorithm can be applied to large regression problems on discrete or continuous domains. Empirical evaluation shows state-of-art performance in discrete cases and competitive results in the continuous case.

Project page



Hobbies and other interests

I love Chinese poetry and occasionally write some myself.

I also love the arts of Muay Thai and boxing, and I used to train at WMT.