I am a third year PhD student supervised jointly by Rich Zemel and Roger Grosse.

I am interested in a wide range of questions applicable to deep learning. How do we train neural networks which generalize well? How should we be optimizing neural networks? How can we impose functional constraints on neural networks? Can we utilize uncertainty effectively in deep neural networks?



(NeurIPS 2019) - Don't Blame the ELBO: A linear VAE perspective on Posterior Collapse: Through analyzing the linear VAE we identify a role of optimization in posterior collapse. We conduct a thorough empirical investigation and find that the linear VAE is largely predictive of posterior collapse in deep VAE architectures.

(NeurIPS 2019) - Lookahead Optimizer: k steps forward, 1 step back: Lookahead iteratively updates two sets of weights. Intuitively, the algorithm chooses a search direction by looking ahead at the sequence of "fast weights" generated by another optimizer and then uses linear interpolation to update the "slow weights".

(NeurIPS 2019) - Preventing Gradient Attenuation in Lipschitz Constrained Convolutional Networks: We identify several limitations of commonly-used approaches to constrain the Lipschitz constant of convolutional architectures. By focusing on a gradient norm preserving design principle, we overcome these issues and recover expressive, Lipschitz constrained, convolutional neural networks.

(ICML 2019) - Sorting out Lipschitz function approximation: Common activation functions are insufficient for norm-constrained (1-Lipschitz) network architectures. By using a gradient norm preserving activation, GroupSort, we prove universal approximation in this setting and achieve provable adversarial robustness with hinge loss.

(ICLR 2019) - Aggregated Momentum: Stability Through Passive Damping: A simple trick for improving momentum optimization. Is trivial to implement and keeps optimization stable at aggressive damping parameters (e.g. 0.999).


(ICML 2018) - Adversarial Distillation of Bayesian Neural Network Posteriors: Using SGLD with GANs to produce posterior samples for BNNs. Has the flexibility of MCMC but with constant memory cost at test time.

Workshop publications

(NeurIPS 2019 MLWG - Contributed Oral) - Information-theoretic limitations on novel task generalization: We provide novel information-theoretic lower-bounds on minimax rates of convergence for algorithms which are trained on data from multiple sources and tested on novel data.

(ICLR 2019) - Understanding posterior collapse in generative latent variable models: We study the Linear VAE model and, using a direct correspondence to pPCA, analyze its loss landscape.


In the Fall of 2017 I taught CSC411/2515 - Introduction to Machine Learning.