As of Febuary 2019, I also work part-time at Google Brain in Toronto.
I was awarded the Google PhD fellowship in Machine Learning for 2021 but had to decline due to graduating before the fellowship would begin. I will be completing my PhD this summer and will be joining Deepmind fulltime as a Research Scientist in their New York City office in the fall.
Oops I Took A Gradient: Scalable Sampling for Discrete Distributions: Will Grathwohl, Kevin Swersky, Milad Hashemi, David Duvenaud, Chris J. Maddison
ICML 2021. Long Oral Presentation Outstanding Paper Award Honorable Mention My Talk
We present a new approach to MCMC sampling for discrete distributions. Our approach exploits a ubiquitous structure that exists in many discrete distributions of interest, gradients, which we use to inform proposals for Metropolis-Hastings. This new sampler greatly outperforms prior samplers for discrete distributions like Gibbs and the Hamming Ball sampler which, for the first time, enables the training of Deep EBMs on high dimensional discrete data.
No MCMC for me: Amortized sampling for fast and stable training of energy-based models: Will Grathwohl, Jacob Kelly, Milad Hashemi, Mohammad Norouzi, Kevin Swersky, David Duvenaud
We present a new method for training Energy-Based Models. Our method uses a generator to amortize the sampling typically used in EBM training. Key to our approach is a new, fast method to regularize the entropy of latent-variable generators. We demonstrate that training in this way is faster and more stable than MCMC-based training. This leads to improved performance of JEM models and allows JEM to be applied to semi-supervised learning for tabular data, outperforming Virtual Adversarial Training.
Learning the Stein Discrepancy for Training and Evaluating Energy-Based Models without Sampling: Will Grathwohl, Kuan-Chieh Wang, Jorn-Henrik Jacobsen, David Duvenaud, Richard Zemel.
We present a new method for training and evaluating unnormalized density models. Our method is based on estimating the Stein Discrepancy between our model and the data distribution. Unlike other discrepancy measures, the Stein Discrepancy only requires an unnormalized model and samples from the data distribution to evalaute. We train a neural network to estimate this discrepancy and show that it can be used for goodness-of-fit testing, model evalaution, and model training. Our method greatly outperforms previous kernel-based methods for estimating Stein Discrepancies.
Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One: Will Grathwohl, Jackson Wang, Jorn-Henrik Jacobsen, David Duvenaud, Mohammad Norouzi, Kevin Swersky.
ICLR 2020. Oral Presentation My Talk.
We show that you can reinterpret standard classification architectures as energy-based generative models and train them as such. Doing this allows us to achieve SOTA performance at BOTH generative and discriminative modeling in a single model. Adding this energy-based training also gives surprising other benefits such as increased calibration, mechanisms for out-of-distribution detection, and adversarial robustness!
Understanding the Limitations of Conditional Generative Models: Ethan Fetaya, Jörn-Henrik Jacobsen, Will Grathwohl, Richard Zemel.
We examine the performance of conditional generative models for discriminative tasks and study why they fail to perform as well as purely discriminative models. We provide theoretical justification for this failing and provide a new dataset which demonstrates our theory.
Invertible Residual Networks: Jens Behrmann*, Will Grathwohl* Ricky T. Q. Chen, David Duvenaud, Jorn-Henrik Jocobsen* (*equal contribution)
ICML 2019. Long Oral Presentation.
We make ResNets inveritible without dimension splitting heuristics. We demonstrate that these models can be used in building state-of-the-art generative and discriminitive models.
Modeling Global Class Structure Leads to Rapid Integration of New Classes: Will Grathwohl, Eleni Triantafillou, Xuechen Li, David Duvenaud and Richard Zemel.
NeurIPS 2018 Workshop on Meta-Learning
NeurIPS 2018 Workshop on Continual Learning
Training Glow with Constant Memory Cost: Xuechen Li, Will Grathwohl.
NIPS 2018 Workshop on Bayesian Deep Learning
Gradient-Based Optimization of Neural Network Architecture: Will Grathwohl, Elliot Creager, Kamyar Ghasemipour, Richard Zemel.
ICLR 2018 Workshop.
Backpropagation through the Void: Optimizing control variates for black-box gradient estimation: Will Grathwohl, Dami Choi, Yuhuai Wu, Geoff Roeder, David Duvenaud.
NIPS 2017 Deep Reinforcement Learning Symposium. Oral Presentation. Video of my talk found here.
Using and Abusing Gradients for Discrete MCMC and Energy-Based Models: ICLR 2021 Workshop on Energy-Based Models, May 2021. Link.
Using and Abusing Gradients for Discrete MCMC and Energy-Based Models: CMU Artificial Intelligence Seminar Series, Feb 2021. Link.
Your Brain on Energy-Based Models: Seminar on Theoretical Machine Learning, Institute for Advanced Study, March 2020. Video.
Your Classifier is Secretly and Energy-Based Model and You Should Treat it Like One: Generative Models and Uncertainty, Copenhagen, Denmark, October 2019.
A workshop on generative models organized at the Technical University of Denmark.
Awards and Fellowships
Borealis AI Graduate Fellowship: A $50,000, 2 year fellowship funding research in AI. Funded by the Royal Bank of Canada. Huawei Prize: A financial award based on academic and research performance. ICLR 2018 Travel Award Best Paper Award: Symposium on Advances in Approximate Bayesian Inference 2018