Steven Luo

ML Intern @ Vivid Machines | CS @ University of Toronto (UofT)

Email: stevenlts[at]cs.toronto.edu (research); stevents.luo@mail.utoronto.ca (connect)

Links: {github, linkedin, scholar}

[about] [projects] [blogs] [cv]

About Me

Hi! Nice to meet you (virtually)! I am an aspiring ML researcher who is currently pursuing a CompSci specialist at University of Toronto (year 3). My research interest revolves around `computer vision`, `graphics`, and `neural fields`.

My ML journey started with explorations into computer vision back in 2018, a time when state of the art was still mainly CNNs. Since then, I’ve worked on numerous projects related to object classification (lip reading), detection (real-time hand gesture detection), generation (font style transfer & single-eye face generation). When I came to UofT, I decided to take a quick detour into the cross-section of computer vision and neuroscience. At the UTSC CoNSens Lab, I developed a novel classification/detection model analogous to the ventral/dorsal streams of the human brain. We would use explainability methods like guided backprop to understand the feature detectors in the network and draw correlations to the detectors found in a biological brain. In 2023, I went back to pure ML research and found a passion in neural fields and 3D graphics. With my colleagues at the University of Hong Kong, we developed a novel method that utilizes the underlying patterns of a neural fields activations to reduce asymptotic inferences costs of implicit neural fields by up to 300x. Orthogonal to implicit neural fields are its explicit counterparts such as Instant-NGP, which are state-of-the-art methods for 3D representations but are very poorly understood. This led to my current work with Professor David Lindell at the University of Toronto where I am developing a framework for quantifying the expressivities of grid-based neural fields. This would hopefully serve as a heuristic for explaining the effectiveness of these neural fields and aid our selections of the network configurations.

Besides research, I also find joy in sharing my experiences through leadership. From founding my high school's first student-led team, the AI Project Group, to being the vice president of the UofT Machine Intelligence Student Team, I have always loved to explore the broader perspectives of AI through collaboration with liked-minded peers.

Affiliations: UTMIST (community)

Publications | [Other Projects]

Nonparametric Teaching of Implicit Neural Representations
Chen Zhang*, Steven Tin Sui Luo*, Jason Chun Lok Li, Yik Chung WU, Ngai Wong
ICML 2024

Abstract

We investigate the learning of implicit neural representation (INR) using an overparameterized multilayer perceptron (MLP) via a novel nonparametric teaching perspective. The latter offers an efficient example selection framework for teaching nonparametrically defined (viz. non-closed-form) target functions, such as image functions defined by 2D grids of pixels. To address the costly training of INRs, we propose a paradigm called Implicit Neural Teaching (INT) that treats INR learning as a nonparametric teaching problem, where the given signal being fitted serves as the target function. The teacher then selects signal fragments for iterative training of the MLP to achieve fast convergence. By establishing a connection between MLP evolution through parameter-based gradient descent and that of function evolution through functional gradient descent in nonparametric teaching, we show for the first time that teaching an overparameterized MLP is consistent with teaching a nonparametric learner. This new discovery readily permits a convenient drop-in of nonparametric teaching algorithms to broadly enhance INR training efficiency, demonstrating 30%+ training time savings across various input modalities.

[OpenReview] [Project Page] [Github]
ASMR: Activation-Sharing Multi-Resolution Coordinate Networks for Efficient Inference
Steven Tin Sui Luo*, Jason Chun Lok Li*, Le Xu & Ngai Wong
ICLR 2024

Abstract

Coordinate network or implicit neural representation (INR) is a fast-emerging method for encoding natural signals (such as images and videos) with the benefits of a compact neural representation. While numerous methods have been proposed to increase the encoding capabilities of an INR, an often overlooked aspect is the inference efficiency, usually measured in multiply-accumulate (MAC) count. This is particularly critical in use cases where inference bandwidth is greatly limited by hardware constraints. To this end, we propose the Activation-Sharing Multi-Resolution (ASMR) coordinate network that combines multi-resolution coordinate decomposition with hierarchical modulations. Specifically, an ASMR model enables the sharing of activations across grids of the data. This largely decouples its inference cost from its depth which is directly correlated to its reconstruction capability, and renders a near O(1) inference complexity irrespective of the number of layers. Experiments show that ASMR can reduce the MAC of a vanilla SIREN model by up to 350x while achieving an even higher reconstruction quality than its SIREN baseline.

[OpenReview] [Github]
Task-Agnostic Approach to Modeling the Ventral and Dorsal Stream
Steven Tin Sui Luo*, Tahsin Rehza* & Matthias Niemeier
MAIN 2023
[Poster]

(* equal contributions)

[Blogs]

Research

How sharing activations can reduce inference cost of neural fields by up to 300x.

Learning

Condensed summary of UofT Course - CSC412: Probabilistic Machine Learning.