I'm a Ph.D. student in machine learning at the University of Toronto and at the Vector Institute. My research interests are in metalearning, learning with multiple agents, the intersection of machine learning with game theory, and  more generally  nested optimization.
I recently finished my M.Sc.A.C. with a focus in data science.
Email: lorraine@cs.toronto.edu Location: Vector Institute, MaRS Center, 661 University Ave., Suite 710, Toronto, ON M5G 1M1 Advisor: David Duvenaud ConnectTeachingService 
Optimizing Millions of Hyperparameters by Implicit Differentiation
We propose an algorithm for inexpensive gradientbased hyperparameter optimization that combines the implicit function theorem (IFT) with efficient inverse Hessian approximations. We present results on the relationship between the IFT and differentiating through optimization, motivating our algorithm. We use the proposed approach to train modern network architectures with millions of weights and millions of hyperparameters. We learn a dataaugmentation network  where every weight is a hyperparameter tuned for validation performance  that outputs augmented training examples; we learn a distilled dataset where each feature in each datapoint is a hyperparameter; and we tune millions of regularization hyperparameters. Jointly tuning weights and hyperparameters with our approach is only a few times more costly in memory and compute than standard training. Jonathan Lorraine, Paul Vicol, David Duvenaud arXiv  bibtex  slides  blog 

JacNet: Learning Functions with Structured Jacobians
Neural networks are trained to learn an approximate mapping from an input domain to a target domain. Often, incorporating prior knowledge about the true mapping is critical to learning a useful approximation. With current architectures, it is difficult to enforce structure on the derivatives of the inputoutput mapping. We propose to directly learn the Jacobian of the inputoutput function with a neural network, which allows easy control of derivative. We focus on structuring the derivative to allow invertibility, and also demonstrate other useful priors can be enforced, such as kLipschitz. Using this approach, we are able to learn approximations to simple functions which are guaranteed to be invertible, and easily compute the inverse. We also show a similar results for 1Lipschitz functions. Jonathan Lorraine, Safwan Hossain ICML INNF Workshop, 2019. paper  bibtex  poster 

SelfTuning Networks: Bilevel Optimization of Hyperparameters using Structured BestResponse Functions
Hyperparameter optimization can be formulated as a bilevel optimization problem, where the optimal parameters on the training set depend on the hyperparameters. We aim to adapt regularization hyperparameters for neural networks by fitting compact approximations to the bestresponse function, which maps hyperparameters to optimal weights and biases. We show how to construct scalable bestresponse approximations for neural networks by modeling the bestresponse as a single network whose hidden units are gated conditionally on the regularizer. We justify this approximation by showing the exact bestresponse for a shallow linear network with L2regularized Jacobian can be represented by a similar gating mechanism. We fit this model using a gradientbased hyperparameter optimization algorithm which alternates between approximating the bestresponse around the current hyperparameters and optimizing the hyperparameters using the approximate bestresponse function. Unlike other gradientbased approaches, we do not require differentiating the training loss with respect to the hyperparameters, allowing us to tune discrete hyperparameters, data augmentation hyperparameters, and dropout probabilities. Because the hyperparameters are adapted online, our approach discovers hyperparameter schedules that can outperform fixed hyperparameter values. Empirically, our approach outperforms competing hyperparameter optimization methods on largescale deep learning problems. We call our networks, which update their own hyperparameters online during training, SelfTuning Networks (STNs). Matthew MacKay, Paul Vicol, Jonathan Lorraine, David Duvenaud, Roger Grosse International Conference on Learning Representations, 2019. arXiv  bibtex  slides  poster  code  blog 

Understanding Neural Architecture Search
Automatic methods for generating stateoftheart neural network architectures without human experts have generated significant attention recently. This is because of the potential to remove human experts from the design loop which can reduce costs and decrease time to model deployment. Neural architecture search (NAS) techniques have improved significantly in their computational efficiency since the original NAS was proposed. This reduction in computation is enabled via weight sharing such as in Efficient Neural Architecture Search (ENAS). However, recently a body of work confirms our discovery that ENAS does not do significantly better than random search with weight sharing, contradicting the initial claims of the authors. We provide an explanation for this phenomenon by investigating the interpretability of the ENAS controller’s hidden state. We are interested in seeing if the controller embeddings are predictive of any properties of the final architecture  for example, graph properties like the number of connections, or validation performance. We find models sampled from identical controller hidden states have no correlation in various graph similarity metrics. This failure mode implies the RNN controller does not condition on past architecture choices. Importantly, we may need to condition on past choices if certain connection patterns prevent vanishing or exploding gradients. Lastly, we propose a solution to this failure mode by forcing the controller’s hidden state to encode pasts decisions by training it with a memory buffer of previously sampled architectures. Doing this improves hidden state interpretability by increasing the correlation controller hidden states and graph similarity metrics. George Adam, Jonathan Lorraine arXiv  bibtex 

Stochastic Hyperparameter Optimization Through Hypernetworks
Machine learning models are often tuned by nesting optimization of model weights inside the optimization of hyperparameters. We give a method to collapse this nested optimization into joint stochastic optimization of weights and hyperparameters. Our process trains a neural network to output approximately optimal weights as a function of hyperparameters. We show that our technique converges to locally optimal weights and hyperparameters for sufficiently large hypernetworks. We compare this method to standard hyperparameter optimization strategies and demonstrate its effectiveness for tuning thousands of hyperparameters. Jonathan Lorraine, David Duvenaud NIPS MetaLearning Workshop, 2017. arXiv  bibtex  slides  poster  code 
Maximizing the Trading Area of a new Facility
Designed an algorithm for finding a point to add a Voronoi diagram, with an associated Voronoi cell that has maximal area. The algorithm was applied to compute an optimal LCBO placement in Toronto. The locations value was confirmed by LCBO representatives. Work was completed as a research internship and supported by NSERC. Dmitry Krass, Atsuo Suzuki 

On Covering Location Problems on Networks with Edge Demand
This paper considers two covering location problems on a network where the demand is distributed along the edges. The first is the classical maximal covering location problem. The second problem is the obnoxious version where the coverage should be minimized subject to some distance constraints between the facilities. It is first shown that the finite dominating set for covering problems with nodal demand does not carry over to the case of edge based demands. Then, a solution approach for the single facility problem is presented. Afterwards, the multifacility problem is discussed and several discretization results for tree networks are presented for the case that the demand is constant on each edge; unfortunately, these results do not carry over to general networks as a counter example shows. To tackle practical problems, the conditional version of the problem is considered and a greedy heuristic is introduced. Afterwards, numerical tests are presented to underline the practicality of the algorithms proposed and to understand the conditions under which accurate modeling of edgebased demand and a continuous edgebased location space are particularly important. Oded Berman, Jörg Kalcsics, Dmitry Krass paper 

Optimizing Facility Location and Design
In this paper we develop a novel methodology to simultaneously optimize locations and designs for a set of new facilities facing competition from some preexisting facilities. Known as the Competitive Facility Location and Design Problem (CFLDP), this model was previously only solvable when a limited number of design scenarios was prespecified. Our methodology removes this limitation and allows for solving of much more realistic models. The results are illustrated with a small case study. Robert Aboolian, Oded Berman, Dmitry Krass paper 