I am a Ph.D. student in Computer Science at University of Toronto, where I am fortunate to be advised by Chris Maddison and Jimmy Ba. Currently, I am also a visiting scholar at Stanford University, hosted by Tatsunori Hashimoto.
Previously, I was a student researcher at Google Research and a research intern at Microsoft Research. In summer 2019, I was a visiting student at UCLA, where I worked with Cho-Jui Hsieh. I obtained my Bachelor degree in Information Engineering from Zhejiang University.
I am on the industrial job market now.
Research
My research focuses on the new scaling paradigms of language models and agents in data-constrained scenarios.
* below denotes equal contribution
Reasoning to Learn from Latent Thoughts
Yangjun Ruan,
Neil Band,
Chris J Maddison,
and Tatsunori Hashimoto
arXiv preprint arXiv:2503.18866,
2025
TL;DR: We introduce "reasoning to learn", a new data-efficient pretraining paradigm that allows an LM to bootstrap its capability on limited, task-agnostic data.
Observational Scaling Laws and the Predictability of Language Model Performance
Yangjun Ruan,
Chris J Maddison,
and Tatsunori Hashimoto
In Advances in Neural Information Processing Systems
(NeurIPS),
2024
[Spotlight]
TL;DR: We introduce observational scaling laws that unify a large set of public LMs in a shared capability space, enabling a low-cost, high-resolution, and broad-coverage scaling analysis for complex LM capabilities
-
Weighted Ensemble Self-Supervised Learning
Yangjun Ruan,
Saurabh Singh,
Warren Morningstar,
Alexander A. Alemi,
Sergey Ioffe,
Ian Fischer,
and Joshua V. Dillon
In International Conference on Learning Representations
(ICLR),
2023
TL;DR: An efficient training-time ensemble method for improving self-supervised representation learning, achieving SOTA results on ImageNet SSL & few-shot benchmarks.
Optimal Representations for Covariate Shift
Yangjun Ruan*,
Yann Dubois*,
and Chris J Maddison
In International Conference on Learning Representations
(ICLR),
2022
TL;DR: We derive a self-supervised objective for learning optimally robust representations under covariate shift, offering insights into CLIP’s robustness and further enhancing its distributional robustness.
Improving Lossless Compression Rates via Monte Carlo Bits-Back Coding
Yangjun Ruan*,
Karen Ullrich*,
Daniel Severo*,
James Townsend,
Ashish Khisti,
Arnaud Doucet,
Alireza Makhzani,
and Chris J Maddison
In International Conference on Machine Learning
(ICML),
2021
[Oral]
TL;DR: We introduce the Monte Carlo bits-back coding framework for deriving asymptotically optimal compression algorithms from tighter variational bounds.
Selected Awards & Honors
- Ontario Graduate Scholarship, 2023
- DiDi Gruduate Student Award, 2021
- CHU Kochen Scholarship (highest honor at Zhejiang University), 2019.
- Cross-disciplinary Scholars in Science and Technology (CSST), UCLA, 2019.
- National Scholarship (top 1.5%), 2017, 2018, 2019.
- Meritorious Winner, Interdisciplinary Contest in Modeling (ICM), 2018.