About

I am an Associate Professor of Computer Science at the University of Toronto, Schwartz Reisman Chair in Technology and Society, and a founding member of the Vector Institute. I am also a Member of Technical Staff on the Alignment Science Team at Anthropic, where my work focuses on training data attribution. I hold a Schmidt Sciences AI2050 Senior Fellowship, Sloan Fellowship, and Canada CIFAR AI Chair.

My research has focused on better understanding neural net training dynamics, and using this understanding to improve training speed, generalization, uncertainty estimation, and automatic hyperparameter tuning. I'm now focusing on applying our understanding of deep learning to AI alignment. Given how fast AI is progressing, the problem of ensuring AIs are robustly aligned with human values seems like the most important thing we can be working on now. Specific directions I'm interested in include:

  • Given a surprising behavior from an AI system, how can we efficiently figure out which training examples were responsible (so that we can then investigate the detailed mechanisms)?
  • How can we assure the safety of AI systems powerful enough to engage in strategic deception?
  • How can we elicit reliable information from models we don't fully trust?
  • How can we efficiently remove dangerous or unwanted information from a trained model?

Contact

Department of Computer Science
University of Toronto
Office: Pratt 290F
6 King's College Rd.
Toronto, ON M5S 3G4, Canada
Phone: 416-978-7391
e-mail: rgrosse_at_cs_toronto_edu