I am a second year PhD student supervised jointly by Rich Zemel and Roger Grosse.

I am interested in a wide range of questions applicable to deep learning. How do we train neural networks which generalize well? How should we be optimizing neural networks? How does optimization affect generalization? How can we impose functional constraints on neural networks? Can we utilize uncertainty effectively in deep neural networks?



(ICML 2019) - Sorting out Lipschitz function approximation: Common activation functions are insufficient for norm-constrained (1-Lipschitz) network architectures. By using a gradient norm preserving activation, GroupSort, we prove universal approximation in this setting and achieve provable adversarial robustness with hinge loss.

(ICLR 2019) - Aggregated Momentum: Stability Through Passive Damping: A simple trick for improving momentum optimization. Is trivial to implement and keeps optimization stable at aggressive damping parameters (e.g. 0.999).


(ICML 2018) - Adversarial Distillation of Bayesian Neural Network Posteriors: Using SGLD with GANs to produce posterior samples for BNNs. Has the flexibility of MCMC but with constant memory cost at test time.

Workshop publications

Understanding posterior collapse in generative latent variable models: We study the Linear VAE model and, using a direct correspondence to pPCA, analyze its loss landscape.


In the Fall of 2017 I taught CSC411/2515 - Introduction to Machine Learning.