I'm a first year PhD student at the University of Toronto and Vector Institute, supervised by Roger Grosse and Geoffrey Hinton.
I'm interested in understanding the computational mechanisms that give rise to intelligence. To this end, I'm currently working towards improving the generalization power and robustness of deep learning models, as well as building algorithms that can adapt quickly and learn from limited data.
Engineering Science, University of Toronto; Bachelor of Applied Science and Engineering
- Specialized in Robotics.
- 3.98 CGPA, First in Graduating Class in Engineering Science, 2019.
- Preventing Gradient Attenuation in Lipschitz Constrained Convolutional Networks
Qiyang Li,* Saminul Haque,* Cem Anil, James Lucas, Roger Grosse, Jörn-Henrik Jacobsen
Lipschitz constraints under L2 norm on deep neural networks are useful for provable adversarial robustness bounds, stable training, and Wasserstein distance estimation. In our earlier paper, we identify a key obstacle for training networks with a strict Lipschitz constraint - gradient norm attenuation - and develop methods to overcome this in the fully connected setting. In this paper, we extend our methods to convolutional networks. The architecture we develop can achieve tight Lipschitz constraints using an expressive parameterization of orthogonal convolutions, which we refer to as Block Convolutional Orthogonal Parameterization. Our model achieves state-of-the-art performance on provable robustness for image classification tasks. ( * equal contribution)
- Sorting Out Lipschitz Function Approximation
Cem Anil,* James Lucas,* Roger Grosse
Training neural networks with a desired Lipschitz constant is useful for provable adversarial robustness, Wasserstein distance estimation and generalization. The challenge is to do this while retaining expressive power. In this paper, we first identify a pathology shared by previous attempts to build provably Lipschitz architectures, then develop a new architecture that overcomes this pathology. Our architecture makes use of a new activation function based on sorting - GroupSort. Empirically, GroupSort networks achieve tighter estimates of Wasserstein distance and can achieve provable adversarial robustness guarantees with little cost to accuracy. ( * equal contribution)
- TimbreTron: A WaveNet(CycleGAN(CQT(Audio))) Pipeline for Musical Timbre Transfer
Sicong Huang, Qiyang Li, Cem Anil, Xuchan Bao, Sageev Oore, Roger Grosse
In this work, we address the problem of musical timbre transfer, where the goal is to manipulate the timbre of a sound sample from one instrument to match another instrument while preserving other musical content, such as pitch, rhythm, and loudness. We introduce TimbreTron, which applies "image" domain style transfer to a time-frequency representation of the audio signal, and then produces a high-quality waveform using a conditional WaveNet synthesizer. We show that the Constant Q Transform (CQT) representation is particularly well-suited to convolutional architectures due to its approximate pitch equivariance. Based on human perceptual evaluations, we confirmed that TimbreTron recognizably transferred the timbre while otherwise preserving the musical content, for both monophonic and polyphonic samples.
- Training Deep Networks With Synthetic Data: Bridging the Reality Gap by Domain Randomization
Jonathan Tremblay,* Aayush Prakash,* David Acuna,* Mark Brophy,* Varun Jampani, Cem Anil, Thang To, Eric Cameracci, Shaad Boochoon, Stan Birchfield
We present a system for training deep neural networks for object detection using synthetic images. To handle the variability in real-world data, the system relies upon the technique of domain randomization, in which the parameters of the simulator - such as lighting, pose, object textures, etc. - are randomized in non-realistic ways to force the neural network to learn the essential features of the object of interest. We explore the importance of these parameters, showing that it is possible to produce a network with compelling performance using only non-artistically-generated synthetic data. With additional fine-tuning on real data, the network yields better performance than using real data alone. This result opens up the possibility of using inexpensive synthetic data for training neural networks while avoiding the need to collect large amounts of hand-annotated real-world data or to generate high-fidelity synthetic worlds - both of which remain bottlenecks for many applications. ( * equal contribution)