I am a second year PhD student supervised jointly by Rich Zemel and Roger Grosse.

I am interested in a wide range of questions applicable to deep learning. How do we train neural networks which generalize well? How should we be optimizing neural networks? How does optimization affect generalization? How can we impose functional constraints on neural networks? Can we utilize uncertainty effectively in deep neural networks?



Sorting out Lipschitz function approximation: Common activation functions are insufficient for norm-constrained (1-Lipschitz) network architectures. By using a gradient norm preserving activation, GroupSort, we prove universal approximation in this setting and achieve provable adversarial robustness with hinge loss.

Aggregated Momentum: Stability Through Passive Damping: A simple trick for improving momentum optimization. Is trivial to implement and keeps optimization stable at aggressive damping parameters (e.g. 0.999).


(ICML 2018) - Adversarial Distillation of Bayesian Neural Network Posteriors: Using SGLD with GANs to produce posterior samples for BNNs. Has the flexibility of MCMC but with constant memory cost at test time.


In the Fall of 2017 I taught CSC411/2515 - Introduction to Machine Learning.