Will Grathwohl

Email: wgrathwohl[ at ]cs[ dot ]toronto[ dot ]edu
Github: wgrathwohl
Linkedin:
will-grathwohl
CV
A photo of me

Drawing inspiration from many of my great math professors from undergrad, this website will be poorly made, and have lots improperly formatted HTML.

I completed my undergraduate degree in Mathematics at MIT in 2014. I am now a PhD student in the Machine Learning Group here at the University of Toronto.

I am co-supervised by Richard Zemel and David Duvenaud.

The bulk of my PhD has focused around generative models and how they can be made more flexible, expressive, unconstrained, and applied to downstream discriminative tasks. I have published work on Variational Autoencoders, Normalizing Flows and most recently Energy-Based Models.

As of Febuary 2019, I also work part-time at Google Brain in Toronto.

My last paper, Oops I Took A Gradient: Scalable Sampling for Discrete Distributions, is out now. We present a new approach for MCMC sampling from discrete distributions that enables the training of Deep Energy-Based Models on discrete data!

Papers

Conference

No MCMC for me: Amortized sampling for fast and stable training of energy-based models: Will Grathwohl, Jacob Kelly, Milad Hashemi, Mohammad Norouzi, Kevin Swersky, David Duvenaud
ICLR 2021.
We present a new method for training Energy-Based Models. Our method uses a generator to amortize the sampling typically used in EBM training. Key to our approach is a new, fast method to regularize the entropy of latent-variable generators. We demonstrate that training in this way is faster and more stable than MCMC-based training. This leads to improved performance of JEM models and allows JEM to be applied to semi-supervised learning for tabular data, outperforming Virtual Adversarial Training.

Learning the Stein Discrepancy for Training and Evaluating Energy-Based Models without Sampling: Will Grathwohl, Kuan-Chieh Wang, Jorn-Henrik Jacobsen, David Duvenaud, Richard Zemel.
ICML 2020.
We present a new method for training and evaluating unnormalized density models. Our method is based on estimating the Stein Discrepancy between our model and the data distribution. Unlike other discrepancy measures, the Stein Discrepancy only requires an unnormalized model and samples from the data distribution to evalaute. We train a neural network to estimate this discrepancy and show that it can be used for goodness-of-fit testing, model evalaution, and model training. Our method greatly outperforms previous kernel-based methods for estimating Stein Discrepancies.

Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One: Will Grathwohl, Jackson Wang, Jorn-Henrik Jacobsen, David Duvenaud, Mohammad Norouzi, Kevin Swersky.
ICLR 2020. Oral Presentation
My Talk.
We show that you can reinterpret standard classification architectures as energy-based generative models and train them as such. Doing this allows us to achieve SOTA performance at BOTH generative and discriminative modeling in a single model. Adding this energy-based training also gives surprising other benefits such as increased calibration, mechanisms for out-of-distribution detection, and adversarial robustness!

Understanding the Limitations of Conditional Generative Models: Ethan Fetaya, Jörn-Henrik Jacobsen, Will Grathwohl, Richard Zemel.
ICLR 2020.
We examine the performance of conditional generative models for discriminative tasks and study why they fail to perform as well as purely discriminative models. We provide theoretical justification for this failing and provide a new dataset which demonstrates our theory.

Invertible Residual Networks: Jens Behrmann*, Will Grathwohl* Ricky T. Q. Chen, David Duvenaud, Jorn-Henrik Jocobsen* (*equal contribution)
ICML 2019. Long Oral Presentation.
We make ResNets inveritible without dimension splitting heuristics. We demonstrate that these models can be used in building state-of-the-art generative and discriminitive models.

FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models: Will Grathwohl*, Ricky T. Q. Chen*, Jesse Bettencourt, Ilya Sutskever, David Duvenaud. (*equal contribution)
ICLR 2019. Oral Presentation.
My Talk
We utilize the recently proposed Neural ODEs to construct the state-of-the-art flow-based generative model!

Backpropagation through the Void: Optimizing control variates for black-box gradient estimation: Will Grathwohl, Dami Choi, Yuhuai Wu, Geoff Roeder, David Duvenaud.
ICLR 2018.
We present a general method for estimating the gradients of expectation of functions of random variables.
Our method can be applied to distributions of discrete random variables or even when the function being optimized is not known!

Preprints

Oops I Took A Gradient: Scalable Sampling for Discrete Distributions: Will Grathwohl, Kevin Swersky, Milad Hashemi, David Duvenaud, Chris J. Maddison
Arxiv Preprint
We present a new approach to MCMC sampling for discrete distributions. Our approach exploits a ubiquitous structure that exists in many discrete distributions of interest, gradients, which we use to inform proposals for Metropolis-Hastings. This new sampler greatly outperforms prior samplers for discrete distributions like Gibbs and the Hamming Ball sampler which, for the first time, enables the training of Deep EBMs on high dimensional discrete data.

Workshops

Scaling RBMs to High Dimensional Data with Invertible Neural Networks: Will Grathwohl*, Xuechen Li*, Kevin Swersky, Milad Hashemi, Jorn-Henrick Jacobsen, Mohammad Norouzi, Geoffrey Hinton. (*equal contribution)
ICML Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models.

FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models: Will Grathwohl*, Ricky T. Q. Chen*, Jesse Bettencourt, Ilya Sutskever, David Duvenaud. (*equal contribution)
Symposium on Advances in Approximate Bayesian Inference 2018. Oral Presentation, Best Paper Award.

Modeling Global Class Structure Leads to Rapid Integration of New Classes: Will Grathwohl, Eleni Triantafillou, Xuechen Li, David Duvenaud and Richard Zemel.
NeurIPS 2018 Workshop on Meta-Learning
NeurIPS 2018 Workshop on Continual Learning

Training Glow with Constant Memory Cost: Xuechen Li, Will Grathwohl.
NIPS 2018 Workshop on Bayesian Deep Learning

Gradient-Based Optimization of Neural Network Architecture: Will Grathwohl, Elliot Creager, Kamyar Ghasemipour, Richard Zemel.
ICLR 2018 Workshop.

Backpropagation through the Void: Optimizing control variates for black-box gradient estimation: Will Grathwohl, Dami Choi, Yuhuai Wu, Geoff Roeder, David Duvenaud.
NIPS 2017 Deep Reinforcement Learning Symposium. Oral Presentation. Video of my talk found here.

Invited Talks

Using and Abusing Gradients for Discrete MCMC and Energy-Based Models: ICLR 2021 Workshop on Energy-Based Models, May 2021. Link.

Using and Abusing Gradients for Discrete MCMC and Energy-Based Models: CMU Artificial Intelligence Seminar Series, Feb 2021. Link.

Your Brain on Energy-Based Models: Seminar on Theoretical Machine Learning, Institute for Advanced Study, March 2020. Video.

Your Classifier is Secretly and Energy-Based Model and You Should Treat it Like One: Generative Models and Uncertainty, Copenhagen, Denmark, October 2019.
A workshop on generative models organized at the Technical University of Denmark.

Awards and Fellowships

Borealis AI Graduate Fellowship: A $50,000, 2 year fellowship funding research in AI. Funded by the Royal Bank of Canada.
Huawei Prize: A financial award based on academic and research performance.
ICLR 2018 Travel Award
Best Paper Award: Symposium on Advances in Approximate Bayesian Inference 2018