My long-term research goal is about developing systems that can learn good generic representations of the world given limited, weak, or no explicit supervision, while still being able to quickly adapt to new tasks with sample efficiency comparable to humans. Currently, I'm exploring adversarial machine learning as tools for this goal, both the adversarial perturbation phenomenon and adversarial training à la GAN. Previously, I worked on scaling Gaussian processes for my PhD because they are sample efficient but computationally burdensome. For more of my old musings, check out my research blog.
I did my PhD under the supervision of David J. Fleet and Aaron Hertzmann at the University of Toronto in the Computer Vision group. I am now a research team lead at Borealis AI (RBC Institute for Research). I can be reached with this email: caoy at domain name cs.toronto.edu.
Introductory machine learning course at UofT Scarborough:
Previously I TA'ed for these courses:
Selected Publications (not updated)
Sara Sabour*, Yanshuai Cao*, Fartash Faghri, David J. Fleet
Accepted for ICLR 2016
We show that the representation of an image in a deep neural network (DNN) can be manipulated to mimic those of other natural images, with only minor, imperceptible perturbations to the original image. Previous methods for generating adversarial images focused on image perturbations designed to produce erroneous class labels, while we concentrate on the internal layers of DNN representations. In this way our new class of adversarial images differs qualitatively from others. While the adversary is perceptually similar to one image, its internal representation appears remarkably similar to a different image, one from a different class, bearing little if any apparent similarity to the input; they appear generic and consistent with the space of natural images. This phenomenon raises questions about DNN representations, as well as the properties of natural images themselves.
Cao, Y., Fleet, D.J.
NIPS2015 Workshop on Nonparametric Methods for Large Scale Representation Learning, Montreal, 2015.
We introduce a framework for analyzing transductive combination of Gaussian process (GP) experts, where independently trained GP experts are combined in a way that depends on test point location, in order to scale GPs to big data. The framework provides some theoretical justification for the generalized product of GP experts (gPoE-GP) which was previously shown to work well in practice but lacks theoretical basis. Based on the proposed framework, an improvement over gPoE-GP is introduced and empirically validated.
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.37, no.12, pp.2415-2427, Dec. 1 2015.
Cao, Y., Fleet, D.J.
Modern Nonparametrics 3: Automating the Learning Pipeline workshop at NIPS, Montreal, 2014.
In this work, we propose a generalized product of experts (gPoE) framework for combining the predictions of multiple probabilistic models. We identify four desirable properties that are important for scalability, expressiveness and robustness, when learning and inferring with a combination of multiple models. Through analysis and experiments, we show that gPoE of Gaussian processes (GP) have these qualities, while no other existing combination schemes satisfy all of them at the same time. The resulting GP-gPoE is highly scalable as individual GP experts can be independently learned in parallel; very expressive as the way experts are combined depends on the input rather than fixed; the combined prediction is still a valid probabilistic model with natural interpretation; and finally robust to unreliable predictions from individual experts.
Advances in Neural Information Processing Systems (NIPS), Lake Tahoe, 2013.
We propose an efficient optimization algorithm for selecting a subset of training data to induce sparsity for Gaussian process regression. The algorithm estimates an inducing set and the hyperparameters using a single objective, either the marginal likelihood or a variational free energy. The space and time complexity are linear in training set size, and the algorithm can be applied to large regression problems on discrete or continuous domains. Empirical evaluation shows state-of-art performance in discrete cases and competitive results in the continuous case.
Hobbies and other interests
I love Chinese poetry and occasionally write some myself.
I also love the arts of Muay Thai and boxing, and I used to train at WMT.