Research

AI Safety

Out-of-Distribution (OOD) Detection

Traditionally, Machine Learning (ML) relies on the assumption that data is independent and identically distributed (IID) during both training and testing. However, in practice, this assumption rarely holds, making it somewhat miraculous that ML models perform as well as they do. As ML models increasingly influence high-stakes decisions, it is crucial to understand their limitations, particularly when the data they encounter does not align with the training distribution.

Out-of-Distribution (OOD) Detection, also known as anomaly detection, novelty detection, or outlier detection, aims to identify samples that deviate significantly from the training distribution. These samples should be flagged for human intervention or more sophisticated handling because the AI model’s ability to generalize to such samples is uncertain and not guranteed. This challenge operationalizes the fundamental question in epistemology: How does one know when one knows?

Our work [1] is the first to provide theoretical guarantees for OOD detection in the most challenging scenarios, where there are no ground truth labels, no prior domain knowledge, and no batches of test data available. Moreover, we have achieved state-of-the-art (SOTA) performance on widely-used benchmarks.

Uncertainty Estimation in Large Language Models

In a related area, I have also worked on improving uncertainty estimation in large language models (LLMs) using prompt ensembles [2]. Our approach is entirely unsupervised and applicable to closed-source models.

[1] Sicong Huang, Jiawei He, and Kry Yik Chau Lui. Rethinking test-time likelihood: The likelihood path principle and its application to OOD detection, 2024
[2] Mingjian Jiang, Yangjun Ruan, Sicong Huang, Saifei Liao, Silviu Pitis, Roger Baker Grosse, and Jimmy Ba. Calibrating language models via augmented prompt ensembles. In NeurIPS Challenges of Deploying Generative AI Workshop 2023, 2023

Behavioural Sciences

AI Psychology

I was a winner of Anthropic’s Inverse Scaling competition [3], where we demonstrated that LLMs can mimic human cognitive biases, with these biases scaling with model size. In other words, larger models more closely replicate human cognitive biases, but cognitive biases can be potentially catastrophic regardless they are in humans or AIs. Comparing the “psychology” of LLMs and humans is nuanced and difficult, as small perturbations can lead to significantly different results. I believe we still do not have definitive answers on this topic. If you’re interested, I highly recommend Noah Goodman’s AI Psychology course, which I audited in Winter 2024. This project was a result of my interests in AI Safety and human behavior and I’m constantly searching for ways to bridge the gap between these two fields.

Understanding human behavior is my life’s passion. I am deeply intrigued by every facet human behavior. If I had to pick a favorite, it’d be motivation theories (though probably not exactly the kind of motivation you’d see from popular science or motivational speakers). I found Jason Plak’s graduate course (textbook) to be the most profound and systematic exploration of this topic. I applied the insights to study student motivation in soft skills training [4] and research programs [5].

[3] Ian R. McKenzie, Alexander Lyzhov, Michael Pieler, Alicia Parrish, Aaron Mueller, Ameya Prabhu, Euan McLean, Aaron Kirtland, Alexis Ross, Alisa Liu, Andrew Gritsevskiy, Daniel Wurgaft, Derik Kauffman, Gabriel Recchia, Jiacheng Liu, Joe Cavanagh, Max Weiss, Sicong Huang, The Floating Droid, Tom Tseng, Tomasz Korbak, Xudong Shen, Yuhui Zhang, Zhengping Zhou, Najoung Kim, Samuel R. Bowman, Ethan Perez. Inverse scaling: When bigger isn’t better. Transactions on Machine Learning Research (TMLR) 2023, 2023 [4] En-Shiun Annie Lee, Luki Danukarjanto, Sadia Sharmin, Shou-Yi Hung, Sicong Huang, and Tong Su. Exploring student motivation in integration of soft skills training within three levels of computer science programs. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1, pages 708–714, 2024 [5] Sadia Sharmin, Sicong Huang, and Robert Soden. Impact of undergraduate research workshops on sense of belonging and self-efficacy based on gender and race. In Proceedings of the 23rd Koli Calling International Conference on Computing Education Research, pages 1–10, 2023

Generative Modelling

My deep learning journey began with deep generative models (DGMs). We developed generative models for unsupervised cipher cracking [6] and musical timbre transfer [7]. I was fascinated by what these models can do and wanted to understand them at the most fundamental level. This curiosity led me to explore the information-theoretic perspectives on generative models, such as evaluating the lossy compression rates of deep generative models [8] and improving mutual information estimation in deep generative models [9].

[6] Aidan N. Gomez, Sicong Huang, Ivan Zhang, Bryan M. Li, Muhammad Osama, and Lukasz Kaiser. Unsupervised cipher cracking using discrete GANs. In International Conference on Learning Representations, 2018
[7] Sicong Huang, Qiyang Li, Cem Anil, Xuchan Bao, Sageev Oore, and Roger B. Grosse. Timbretron: A wavenet(cycleGAN(CQT(audio))) pipeline for musical timbre transfer. In International Conference on Learning Representations, 2019
[8] Sicong Huang, Alireza Makhzani, Yanshuai Cao, and Roger Grosse. Evaluating lossy compression rates of deep generative models. In International Conference on Machine Learning. 2020 [9] Rob Brekelmans, Sicong Huang, Marzyeh Ghassemi, Greg Ver Steeg, Roger Baker Grosse, and Alireza Makhzani. Improving mutual information estimation with annealed and energy-based bounds. In International Conference on Learning Representations, 2022 [*]: Denotes equal contribution.

Continual and Reinforcement Learning

The last ingredient for forecasting human behaviour with AI Agents is a learning algorithm that can actively and continually acquire information and learn from feedback. I have worked on estimating neural network function space distance, demonstrating its effectiveness for continual learning [10]. I also contributed to a reinforcement learning codebase designed to make it easier for researchers to reproduce results and iterate on ideas quickly [11].

[10] Nikita Dhawan, Sicong Huang, Juhan Bae, and Roger Baker Grosse. Efficient parametric approximations of neural network function space distance. In International Conference on Machine Learning, pages 7795–7812. PMLR, 2023
[11] Bryan M. Li, Alexander Cowen-Rivers, Piotr Kozakowski, David Tao, Siddhartha Rao Kamalakara, Nitarshan Rajkumar, Hariharan Sezhiyan, Sicong Huang, and Aidan N. Gomez. Generic reinforcement learning codebase in tensorflow. In The Journal of Open Source Software, 2019

I am currently putting all the ingredients together to build AI Agents that can improve our forecasting accuracy and calibration of human behaviours.

See a full list of my publications on Google Scholar.

Sheldon Huang