Publications

Andrew Wang, Elisa Nguyen, Runshi Yang, Juhan Bae, Sheila McIlraith, and Roger Grosse. Better Training Data Attribution via Better Inverse Hessian-Vector Products. NeurIPS 2025.
Bruno Mlodozeniec, Isaac Reid, Samuel Power, David Krueger, Murat Erdogdu, Richard Turner, and Roger Grosse. Distributional Training Data Attribution. NeurIPS 2025.
Stephen Zhao, Aidan Li, Rob Brekelmans, and Roger Grosse. Reducing the Probability of Bad Outputs in Language Models Using Probabilistic Inference. NeurIPS 2025.
Sang Keun Choe, Hwijeen Ahn, Juhan Bae, Kewen Zhao, Youngseog Chung, Adithya Pratapa, Willie Neiswanger, Emma Strubell, Teruko Mitamura, Jeff Schneider, Eduard Hovy, Roger Grosse, and Eric P. Xing. What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions. NeurIPS 2025.
Juhan Bae, Wu Lin, Jonathan Lorraine, and Roger Grosse. Training Data Attribution via Approximate Unrolled Differentiation. NeurIPS 2024.
Johannes Treutlein, Dami Choi, Jan Betley, Samuel Marks, Cem Anil, Roger Grosse, and Owain Evans. Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data. NeurIPS 2024.
Stephen Zhao, Rob Brekelmans, Alireza Makhzani, and Roger Grosse. Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo. ICML 2024.
Nathan Ng, Roger Grosse, and Marzyeh Ghassemi. Measuring Stochastic Data Complexity with Boltzmann Influence Functions. ICML 2024.
Jin Peng Zhou, Yuhuai Wu, Qiyang Li, and Roger Grosse. REFACTOR: Learning to Extract Theorems from Proofs. ICLR 2024.
Caspar Oesterheld, Johannes Treutlein, Roger Grosse, Vincent Conitzer, and Jakob Foerster. Similarity-Based Cooperative Equilibrium. NeurIPS 2023.
Nikita Dhawan, Sicong Huang, Juhan Bae, and Roger Grosse. Efficient Parametric Approximations of Neural Network Function Space Distance.. ICML 2023.
Juhan Bae, Michael R. Zhang, Michael Ruan, Eric Wang, So Hasegawa, Jimmy Ba, and Roger Grosse. Multi-Rate VAE: Train Once, Get the Full Rate-Distortion Curve. ICLR 2023.
Stephen Zhao, Chris Lu, Roger Grosse, and Jakob Foerster. Proximal learning with opponent learning awareness. NeurIPS 2022.
Juhan Bae, Paul Vicol, Jeff Z. HaoChen, and Roger Grosse. Amortized proximal optimization. NeurIPS 2022.
Juhan Bae, Nathan Ng, Alston Lo, Marzyeh Ghassemi, and Roger Grosse. If influence functions are the answer, then what is the question? NeurIPS 2022.
Cem Anil, Ashwini Pokle, Kaiqu Liang, Johannes Treutlein, Yuhuai Wu, Shaojie Bai, J. Zico Kolter, and Roger Grosse. Path independent equilibrium networks can better exploit test-time computation. NeurIPS 2022.
Paul Vicol, Jonathan Lorraine, Fabian Pedregosa, David Duvenaud, and Roger Grosse. On implicit bias in overparameterized bilevel optimization. ICML 2022.
Rob Brekelmans, Sicong Huang, Marzyeh Ghassemi, Greg ver Steeg, Roger Grosse, and Alireza Makhzani. Improving Mutual Information Estimation with Annealed and Energy-Based Bounds. ICLR 2022.
Guodong Zhang, Kyle Hsu, Jianing Li, Chelsea Finn, and Roger Grosse. Differentiable Annealed Importance Sampling and the Perils of Gradient Noise. NeurIPS 2021.
Shengyang Sun, Jiaxin Shi, Andrew Gordon Wilson, and Roger Grosse. Scalable Variational Gaussian Processes via Harmonic Kernel Decomposition. ICML 2021. Code
James Lucas, Juhan Bae, Michael Zhang, Stanislav Fort, Richard Zemel, and Roger Grosse. Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes. ICML 2021.
Yuhuai Wu, Markus Rabe, Wenda Li, Jimmy Ba, Roger Grosse, and Christian Szegedy. LIME: Learning inductive bias for primitives of mathematical reasoning. ICML 2021.
Guodong Zhang, Xuchan Bao, Laurent Lessard, and Roger Grosse. A unified analysis of first-order methods for smooth games via integral quadratic constraints. JMLR 2021. Code
Chaoqi Wang, Shengyang Sun, and Roger Grosse. Beyond marginal uncertainty: How accurately can Bayesian regression models estimate posterior predictive correlations? AISTATS 2021. Code
Jens Behrmann, Paul Vicol, Kuan-Chieh Wang, Roger Grosse, and Jorn-Henrik Jacobsen. Understanding and mitigating exploding inverses in invertible neural networks. AISTATS 2021. Code
Yuhuai Wu, Albert Jiang, Jimmy Ba, and Roger Grosse. INT: An Inequality Benchmark for Evaluating Generalization in Theorem Proving. ICLR 2021. Code
Shun-ichi Amari, Jimmy Ba, Roger Grosse, Xuechen Li, Atsushi Nitanda, Taiji Suzuku, Denny Wu, and Ji Xu. When does preconditioning help or hurt generalization? ICLR 2021.
Pashootan Vaezipoor, Gil Lederman, Yuhuai Wu, Chris J. Maddison, Roger Grosse, Edward Lee, Sanjit A. Seshia, and Fahiem Bacchus. Learning Branching Heuristics for Propositional Model Counting. AAAI 2021.
Juhan Bae and Roger Grosse. Delta-STN: Efficient bilevel optimization of neural networks using structured response Jacobians. NeurIPS 2020. Code
Xuchan Bao, James Lucas, Sushant Sachdeva, and Roger Grosse. Regularized linear autoencoders recover the principal components, eventually. NeurIPS 2020. Code
Sheldon Huang, Alireza Makhzani, Yanshuai Cao, and Roger Grosse. Evaluating lossy compression rates of deep generative models. ICML 2020. Code
Chaoqi Wang, Guodong Zhang, and Roger Grosse. Picking winning tickets before training by preserving gradient flow. ICLR 2020. Code
Guodong Zhang, Lala Li, Zachary Nado, James Martens, Sushant Sachdeva, George E. Dahl, Christopher J. Shallue, and Roger Grosse. Which algorithmic choices matter at which batch sizes? Insights from a noisy quadratic model. NeurIPS 2019. Code
Guodong Zhang, James Martens, and Roger Grosse. Fast convergence of natural gradient descent for overparameterized neural networks. NeurIPS 2019.
James Lucas, George Tucker, Roger Grosse, and Mohammad Norouzi. Don't blame the ELBO! A linear VAE perspective on posterior collapse. NeurIPS 2019. Code
Qiyang Li, Saminul Haque, Cem Anil, James Lucas, Roger Grosse, and Jorn-Henrik Jacobsen. Preventing gradient attenuation in Lipschitz-constrained convolutional networks. NeurIPS 2019. Code
Cem Anil, James Lucas, and Roger Grosse. Sorting out Lipschitz function approximation. ICML 2019. Code
Chaoqi Wang, Roger Grosse, Sanja Fidler, and Guodong Zhang. EigenDamage: structured pruning in the Kronecker-factored eigenbasis. ICML 2019. Code
Matthew MacKay, Paul Vicol, Jonathan Lorraine, David Duvenaud, and Roger Grosse. Self-tuning networks: bilevel optimization of hyperparameters using structured best-response functions. ICLR 2019. Code
Shengyang Sun, Guodong Zhang, Jiaxin Shi, and Roger Grosse. Functional variational Bayesian neural networks. ICLR 2019. Code
Sheldon Huang, Qiyang Li, Cem Anil, Xuchan Bao, Sageev Oore, and Roger Grosse. TimbreTron: A WaveNet ( CycleGAN ( CQT ( audio ))) pipeline for musical timbre transfer. ICLR 2019.
Guodong Zhang, Chaoqi Wang, Bowen Xu, and Roger Grosse. Three mechanisms of weight decay regularization. ICLR 2019. Code
James Lucas, Shengyang Sun, Richard Zemel, and Roger Grosse. Aggregated momentum: stability through passive damping. ICLR 2019. Code
Matthew MacKay, Paul Vicol, Jimmy Ba, and Roger Grosse. Reversible recurrent neural networks. NIPS 2018. Code
Tian Qi Chen, Xuechen Li, Roger Grosse, and David Duvenaud. Isolating sources of disentanglement in variational autoencoders. NIPS 2018. Code
Shengyang Sun, Guodong Zhang, Chaoqi Wang, Wenyuan Zeng, Jiaman Li, and Roger Grosse. Differentiable compositional kernel learning for Gaussian processes. ICML 2018. Code
Guodong Zhang, Shengyang Sun, David Duvenaud, and Roger Grosse. Noisy natural gradient as variational inference. ICML 2018. Code: 1, 2
Kuan-Chieh Wang, Paul Vicol, James Lucas, Li Gu, Roger Grosse, and Richard Zemel. Adversarial distillation of Bayesian neural network posteriors. ICML 2018. Code
Yeming Wen, Paul Vicol, Jimmy Ba, Dustin Tran, and Roger Grosse. Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches. ICLR 2018.
Yuhuai Wu, Mengye Ren, Renjie Liao, and Roger Grosse. Understanding short-horizon bias in stochastic meta-optimization. ICLR 2018.
Yuhuai Wu, Elman Mansimov, Shun Liao, Roger Grosse, and Jimmy Ba. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. NIPS 2017.
- code (OpenAI Baselines)
Aidan Gomez, Mengye Ren, Raquel Urtasun, and Roger Grosse. The Reversible Residual Network: Backpropagation Without Storing Activations. NIPS 2017.
- code
Jacob Gardner, Chuan Guo, Kilian Weinberger, Roman Garnett, and Roger Grosse. Discovering and exploiting additive structure for Bayesian optimization. AISTATS 2017.
Jimmy Ba, Roger Grosse, and James Martens. Distributed second-order optimization using Kronecker-factored approximations. ICLR 2017.
Yuhuai Wu, Yuri Burda, Ruslan Salakhutdinov, and Roger Grosse. On the quantitative analysis of decoder-based generative models.. ICLR 2017.
- code
Roger Grosse, Siddharth Ancha, and Daniel Roy. Measuring the reliability of MCMC inference with bidirectional Monte Carlo. NIPS 2016.
- arXiv
Roger Grosse and James Martens. A Kronecker-factored approximate Fisher matrix for convolution layers. ICML 2016.
- ICML version
Yuri Burda, Roger Grosse, and Ruslan Salakhutdinov. Importance weighted autoencoders. ICLR 2016.
- code
Jimmy Ba, Roger Grosse, Ruslan Salakhutdinov, and Brendan Frey. Learning wake-sleep recurrent attention models. NIPS 2015.
- NIPS version
Roger Grosse and Ruslan Salakhutdinov. Scaling up natural gradient by sparsely factorizing the inverse Fisher matrix. ICML 2015.
- code
James Martens and Roger Grosse. Optimizing Neural Networks with Kronecker-factored Approximate Curvature. ICML 2015.
- ICML version, and appendix (terser and less readable than the arXiv version)
Yuri Burda, Roger B. Grosse, and Ruslan Salakhutdinov. Accurate and conservative estimates of MRF log-likelihood using reverse annealing. AISTATS 2015.
James R. Lloyd, David Duvenaud, Roger B. Grosse, Joshua B. Tenenbaum, and Zoubin Ghahramani. Automatic construction and natural-language description of nonparametric regression models. AAAI 2014.
- code
- examples
Roger B. Grosse, Chris J. Maddison, and Ruslan Salakhutdinov. Annealing between distributions by averaging moments. NIPS 2013.
- supplemental material
- preprint (from the ICML 2013 workshop Challenges in Representation Learning)
- background
David Duvenaud, James R. Lloyd, Roger B. Grosse, Joshua B. Tenenbaum, and Zoubin Ghahramani. Structure discovery in nonparametric regression through compositional kernel search. ICML 2013.
- code
- background
Roger B. Grosse, Ruslan Salakhutdinov, William T. Freeman, and Joshua B. Tenenbaum. Exploiting compositionality to explore a large space of model structures. UAI 2012. Best Student Paper.
- code
- background
Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Y. Ng. Unsupervised learning of hierarchical representations with convolutional deep belief networks. Communications of the ACM, vol. 54, no. 10, pp. 95-103, 2011.
Roger Grosse, Micah K. Johnson, Edward Adelson, and William T. Freeman. A ground-truth dataset and baseline evaluations for intrinsic image algorithms. ICCV 2009.
- project page
Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Y. Ng. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. ICML 2009. Best Application Paper
Roger Grosse, Rajat Raina, Helen Kwong, and Andrew Y. Ng. Shift-invariant sparse coding for audio classification. UAI 2007
- code

Preprints

Roger Grosse, Juhan Bae, Cem Anil, et al., 2023. Studying Large Language Model Generalization with Influence Functions.
Cem Anil, Guodong Zhang, Yuhuai Wu, and Roger Grosse, 2021. Learning to Give Checkable Answers with Prover-Verifier Games.
Yuhuai Wu, Honghua Dong, Roger Grosse, and Jimmy Ba, 2020. The Scattering Compositional Learner: Discovering Objects, Attributes, Relationships in Analogical Reasoning.
Kevin Luk and Roger Grosse, 2018. A coordinate-free construction of scalable natural gradient.
Roger Grosse, Zoubin Ghahramani, and Ryan Adams, 2015. Sandwiching the marginal likelihood using bidirectional Monte Carlo.

Thesis

Model selection in compositional spaces. Ph.D. thesis, 2014.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search