I am a Research Scientist at NVIDIA within the Toronto AI Lab.

I completed my PhD under the supervision of Rich Zemel and Roger Grosse.

I have a broad set of interests, within deep learning and elsewhere. My research thus far has focused on improving optimization for deep neural networks and better understanding their loss landscape geometry. I have also worked on imposing functional constraints on neural networks, providing theoretical guarantees for learning with limited data, and investigating representation learning.

Moving forwards, I'm excited to continue developing our understanding of deep neural networks and seeking new applications. I'm particularly excited to work on deep learning applications within graphics and 3D geometry processing --- with an eye towards bringing deep learning tools into game development.

jlucas [at] cs [dot] toronto [dot] edu Take that, bots

I also enjoy:

- Being a parent to a wonderful, tiny, noisy human
Big fluffy dogs - Developing video games
- Baking
(especially bread)

Check my google scholar for an up-to-date list of publications.

Probing Few-Shot Generalization with Attributes *Mengye Ren*, Eleni Triantafillou*, Kuan-Chieh Wang*, ***James Lucas***, Jake Snell, Xaq Pitkow, Andreas S. Tolias, Richard Zemel

We investigate under what settings Few-Shot Learners are able to generalize by introducing attribute-dependent decision boundaries that vary across episodes. Empirically, we find that unsupervised representation learning significantly outperforms supervised approaches due to overfitting to the training attributes.

We investigate under what settings Few-Shot Learners are able to generalize by introducing attribute-dependent decision boundaries that vary across episodes. Empirically, we find that unsupervised representation learning significantly outperforms supervised approaches due to overfitting to the training attributes.

Spacetime Representation Learning *Marc Law, ***James Lucas**

We introduce a general family of representations for directed graphs through connected time-oriented Lorentz manifolds, called*spacetimes* in general relativity. Spacetimes intrinsically contain a causal structure that captures ordering between points of the manifold. We empirically evaluate our framework in the tasks of hierarchy extraction of undirected graphs, directed link prediction and representation of directed graphs.

We introduce a general family of representations for directed graphs through connected time-oriented Lorentz manifolds, called

Optimizing Data Collection for Machine Learning *Rafid Mahmood, ***James Lucas**, Jose M. Alvarez, Sanja Fidler, Marc T. Law

We design a formal*optimal data collection problem* that allows designers to specify performance targets, collection costs, a time horizon, and penalties for failing to meet the targets. We provide theoretical guarantees on this problem and solve it practically using gradient descent. We achieve high accuracy estimates of data requirements.

We design a formal

(CVPR 2022) - How Much More Data Do I Need? Estimating Requirements for Downstream Tasks *Rafid Mahmood, ***James Lucas**, David Acuna, Daiqing Li, Jonah Philion, Jose M. Alvarez, Zhiding Yu, Sanja Fidler, Marc T. Law

Given a small training data set and a learning algorithm, how much more data is necessary to reach a target validation or test performance? We develop a method to accurately predict data requirements and evaluate it over a wide range of computer vision tasks.

Given a small training data set and a learning algorithm, how much more data is necessary to reach a target validation or test performance? We develop a method to accurately predict data requirements and evaluate it over a wide range of computer vision tasks.

(ICML 2021) - Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes **James Lucas**, Juhan Bae, Michael R. Zhang, Stanislav Fort, Richard Zemel, Roger Grosse

We analyze the Monotonic Linear Interpolation (MLI) property, wherein linearly interpolating from initialization to optimum leads to a monotonic decrease in the loss. Using tools from differential geometry, we provide sufficient conditions for MLI to hold and provide a thorough empirical investigation of the phenomena.

We analyze the Monotonic Linear Interpolation (MLI) property, wherein linearly interpolating from initialization to optimum leads to a monotonic decrease in the loss. Using tools from differential geometry, we provide sufficient conditions for MLI to hold and provide a thorough empirical investigation of the phenomena.

(ICLR 2021) - Theoretical bounds on estimation error for meta-learning **James Lucas**, Mengye Ren, Irene Raisa KAMENI KAMENI, Toniann Pitassi, Richard Zemel

We prove minimax lower and upper bounds for generalization of meta-learners trained on multiple source tasks to a novel test task. Applications to analysis of meta-learning over hierarchical linear models.

We prove minimax lower and upper bounds for generalization of meta-learners trained on multiple source tasks to a novel test task. Applications to analysis of meta-learning over hierarchical linear models.

(NeurIPS 2020) - Regularized linear autoencoders recover the principal components, eventually *Xuchan Bao, ***James Lucas**, Sushant Sachdeva, Roger Grosse

We prove that a linear VAE can learn axis-aligned principal components, but doing so with regularization leads to intractably slow convergence (investigated via Hessian analysis). We present a novel algorithm to overcome these challenges.

We prove that a linear VAE can learn axis-aligned principal components, but doing so with regularization leads to intractably slow convergence (investigated via Hessian analysis). We present a novel algorithm to overcome these challenges.

(NeurIPS 2019) - Don't Blame the ELBO: A linear VAE perspective on Posterior Collapse **James Lucas**, George Tucker, Roger Grosse, Mohammad Norouzi

Through analyzing the linear VAE we identify a role of optimization in posterior collapse. We conduct a thorough empirical investigation and find that the linear VAE is largely predictive of posterior collapse in deep VAE architectures.

Through analyzing the linear VAE we identify a role of optimization in posterior collapse. We conduct a thorough empirical investigation and find that the linear VAE is largely predictive of posterior collapse in deep VAE architectures.

(NeurIPS 2019) - Lookahead Optimizer: k steps forward, 1 step back *Michael R. Zhang, ***James Lucas**, Geoffrey Hinton, Jimmy Ba

Lookahead iteratively updates two sets of weights. Intuitively, the algorithm chooses a search direction by looking ahead at the sequence of "fast weights" generated by another optimizer and then uses linear interpolation to update the "slow weights".

Lookahead iteratively updates two sets of weights. Intuitively, the algorithm chooses a search direction by looking ahead at the sequence of "fast weights" generated by another optimizer and then uses linear interpolation to update the "slow weights".

(NeurIPS 2019) - Preventing Gradient Attenuation in Lipschitz Constrained Convolutional Networks *Qiyang Li*, Saminul Haque*, Cem Anil, ***James Lucas**, Roger Grosse, Jörn-Henrik Jacobsen

We identify several limitations of commonly-used approaches to constrain the Lipschitz constant of convolutional architectures. By focusing on a gradient norm preserving design principle, we overcome these issues and recover expressive, Lipschitz constrained, convolutional neural networks.

We identify several limitations of commonly-used approaches to constrain the Lipschitz constant of convolutional architectures. By focusing on a gradient norm preserving design principle, we overcome these issues and recover expressive, Lipschitz constrained, convolutional neural networks.

(ICML 2019) - Sorting out Lipschitz function approximation *Cem Anil*, ***James Lucas***, Roger Grosse

Common activation functions are insufficient for norm-constrained (1-Lipschitz) network architectures. By using a gradient norm preserving activation,*GroupSort*, we prove universal approximation in this setting and achieve provable adversarial robustness with hinge loss.

Common activation functions are insufficient for norm-constrained (1-Lipschitz) network architectures. By using a gradient norm preserving activation,

(ICLR 2019) - Aggregated Momentum: Stability Through Passive Damping **James Lucas**, Shengyang Sun, Richard Zemel, Roger Grosse

A simple trick for improving momentum optimization. Is trivial to implement and keeps optimization stable at aggressive damping parameters (e.g. 0.999).

A simple trick for improving momentum optimization. Is trivial to implement and keeps optimization stable at aggressive damping parameters (e.g. 0.999).

(ICML 2018) - Adversarial Distillation of Bayesian Neural Network Posteriors *Kuan-Chieh Wang, Paul Vicol, ***James Lucas**, Li Gu, Roger Grosse, Richard Zemel

Using SGLD with GANs to produce posterior samples for BNNs. Has the flexibility of MCMC but with constant memory cost at test time.

Using SGLD with GANs to produce posterior samples for BNNs. Has the flexibility of MCMC but with constant memory cost at test time.

(ICML 2022 DFUQ) - Calibration Generalization *Annabelle Carrell, Neil Mallinar, ***James Lucas**, Preetum Nakkiran

A fundamental property of a good predictive model is to be well-calibrated. But calibration of deep neural networks remains poorly understood. We show that the calibration of neural networks can be determined by their calibration at training time and the extent to which they fail to generalize.

A fundamental property of a good predictive model is to be well-calibrated. But calibration of deep neural networks remains poorly understood. We show that the calibration of neural networks can be determined by their calibration at training time and the extent to which they fail to generalize.

(ICCV 2021 AVVision) - Causal BERT: Improving object detection by searching for challenging groups *Cinjon Resnick, Or Litany, Amlan Kar, Karsten Kreis, ***James Lucas**, Kyunghyun Cho and Sanja Fidler

Autonomous vehicles (AV) often rely on perception modules built upon neural networks for object detection. These modules frequently have low expected error overall but high error on unknown groups due to biases inherent in the training process. We present a method to find such groups in foresight, leveraging advances in simulation as well as masked language modeling in order to perform causal interventions on simulated driving scenes.

Autonomous vehicles (AV) often rely on perception modules built upon neural networks for object detection. These modules frequently have low expected error overall but high error on unknown groups due to biases inherent in the training process. We present a method to find such groups in foresight, leveraging advances in simulation as well as masked language modeling in order to perform causal interventions on simulated driving scenes.

(NeurIPS 2020 OptML) - On Monotonic Linear Interpolation of Neural Network Parameters **James Lucas**, Juhan Bae, Michael Zhang, Jimmy Ba, Richard Zemel, Roger Grosse

Linear interpolating between initial and final weights provides a path along which the loss often monotonically decreases. We investigate this property, and work towards a better understanding of its cause.

Linear interpolating between initial and final weights provides a path along which the loss often monotonically decreases. We investigate this property, and work towards a better understanding of its cause.

(NeurIPS 2020 MetaLearn) - Flexible Few-Shot Learning of Contextual Similarity *Mengye Ren*, Eleni Triantafillou*, Kuan-Chieh Wang*, ***James Lucas***, Jake Snell, Xaq Pitkow, Andreas S. Tolias, Richard Zemel

We extend Few-Shot Learning to include context-dependent classification criteria, that vary across episodes. We evaluate existing Few-Shot Learners and find that unsupervised representation learning significantly outperforms supervised approaches due to overfitting to the training contexts.

We extend Few-Shot Learning to include context-dependent classification criteria, that vary across episodes. We evaluate existing Few-Shot Learners and find that unsupervised representation learning significantly outperforms supervised approaches due to overfitting to the training contexts.

(NeurIPS 2019 MLWG - Contributed Oral) - Information-theoretic limitations on novel task generalization **James Lucas**, Mengye Ren, Irene Kameni, Toniann Pitassi, Richard Zemel

We provide novel information-theoretic lower-bounds on minimax rates of convergence for algorithms which are trained on data from multiple sources and tested on novel data.

We provide novel information-theoretic lower-bounds on minimax rates of convergence for algorithms which are trained on data from multiple sources and tested on novel data.

(ICLR 2019) - Understanding posterior collapse in generative latent variable models **James Lucas**, George Tucker, Roger Grosse, Mohammad Norouzi

We study the Linear VAE model and, using a direct correspondence to pPCA, analyze its loss landscape.

We study the Linear VAE model and, using a direct correspondence to pPCA, analyze its loss landscape.

In the Fall of 2017 I taught CSC411/2515 - Introduction to Machine Learning.