David Ross - University of Toronto, Canada

Bio

I lead the Visual Dynamics research group, at Google Research. Our goal is to discover new ways to understand video, with an emphasis on objects, motion, and actions. The team's work includes the Tensorflow Object Detection API, as well as models that help power personal video understanding in Google Photos and Cloud Video Intelligence. We publish research at top academic conferences including CVPR, and organize the AVA Challenge to advance state-of-the-art spatiotemporal action recognition in video.

Previously I led the YouTube Mix team that built the personalized algorithmic radio feature at the heart of YouTube Music.

I obtained my Ph.D. in Machine Learning and Computer vision from the University of Toronto, Canada.

Current Work

Some of the latest open source releases from my team: TF Object Detection API for Tensorflow 2.x, TF3D for 3D Scene Understanding, and the AIST++ Human Motion dataset.
The results of the 3rd AVA Action Detection challenge are now available. This event was held at CVPR 2020, in partnership with the International Challenge on Activity Recognition (ActivityNet) workshop.
My talk Context & Attention for Detecting Objects and Actions in Video at the CVPR'20 LSHVU Tutorial is available on YouTube.
Our work on Capturing Special Video Moments with Google Photos was just featured on the Google AI BLog.
I manage a research group in Perception, part of Google AI Research.

Publications

A complete list of my publications and patents at Google Scholar Citations.
Distribution Aware Metrics for Conditional Natural Language Generation.
David M Chan, Yiming Ni, Austin Myers, Sudheendra Vijayanarasimhan, David A Ross, and John Canny. arXiv preprint, 2022. [arXiv]
im2nerf: Image to Neural Radiance Field in the Wild.
Lu Mi, Abhijit Kundu, David Ross, Frank Dellaert, Noah Snavely, and Alireza Fathi. arXiv preprint, 2022. [arXiv]
What’s in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics.
David M. Chan, Austin Myers, Sudheendra Vijayanarasimhan, David A. Ross, Bryan Seybold, John F. Canny. The 1st Workshop on Vision Datasets Understanding, at CVPR 2022. [arXiv]
Optical Mouse: 3D Mouse Pose From Single-View Video.
Bo Hu, Bryan Seybold, Shan Yang, David Ross, Avneesh Sud, Graham Ruby, and Yi Liu. CV4Animals: Computer Vision for Animal Behavior Tracking and Modeling Workshop, at CVPR 2021. [arXiv]
AI Choreographer Music Conditioned 3D Dance Generation with AIST++
Ruilong Li, Shan Yang, David A. Ross, Angjoo Kanazawa. ICCV 2021. [arXiv] project website, dataset
Learning Video Representations from Textual Web Supervision
Jonathan C. Stroud, David A. Ross, Chen Sun, Jia Deng, Rahul Sukthankar, Cordelia Schmid. arXiv 2020. [arXiv]
Active Learning for Video Description With Cluster-Regularized Ensemble Ranking
David Chan, Sudheendra Vijayanarasimhan, David Ross, John Canny. ACCV 2020. [arXiv], [PDF] supplementary
An LSTM Approach to Temporal 3D Object Detection in LiDAR Point Clouds
Rui Huang, Wanyue Zhang, Tom Funkhouser, Abhijit Kundu, David Ross, Caroline Pantofaru, Alireza Fathi. ECCV 2020. [arXiv]
Pillar-based Object Detection for Autonomous Driving
Yue Wang, Abhijit Kundu, Alireza Fathi, Caroline Pantofaru, David Ross, Justin Solomon, Tom Funkhouser. ECCV 2020. [arXiv]
Virtual Multi-view Fusion for 3D Semantic Segmentation
Abhijit Kundu, Xiaoqi (Michael) Yin, Alireza Fathi, Brew Barrington, David Ross, Tom Funkhouser, Caroline Pantofaru. ECCV 2020. [arXiv]
The AVA-Kinetics Localized Human Actions Video Dataset
Ang Li, Meghana Thotakuri, David A. Ross, João Carreira, Alexander Vostrikov, Andrew Zisserman. arXiv 2020. [arXiv] project website
DOPS: Learning to Detect 3D Objects and Predict their 3D Shapes
Mahyar Najibi, Guangda Lai, Abhijit Kundu, Zhichao Lu, Vivek Rathod, Thomas Funkhouser, Caroline Pantofaru, David Ross, Larry S. Davis, Alireza Fathi. CVPR 2020. [arXiv]
Speech2Action: Cross-modal Supervision for Action Recognition
Arsha Nagrani, Chen Sun, David Ross, Rahul Sukthankar, Cordelia Schmid, Andrew Zisserman. CVPR 2020. [PDF] [arXiv] project page, data
D3D: Distilled 3D Networks for Video Action Recognition
Jonathan C. Stroud, David A. Ross, Chen Sun, Jia Deng, Rahul Sukthankar. WACV 2020. 2020. [arXiv] code and pre-trained models
Rethinking the Faster R-CNN Architecture for Temporal Action Localization
Yu-Wei Chao, Sudheendra Vijayanarasimhan, Bryan Seybold, David A Ross, Jia Deng, Rahul Sukthankar. CVPR 2018. [arXiv] Google AI blog
AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions
Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, Jitendra Malik. CVPR 2018. [arXiv] project website, Google AI blog
On using nearly-independent feature families for high precision and confidence
Omid Madani, Manfred Georg, David Ross. Machine Learning Journal, 2013. [PDF]
The Intervalgram: An audio feature for large-scale melody recognition
Thomas C. Walters, David Ross, Richard F. Lyon. 9th International Symposium on Computer Music Modeling and Retrieval (CMMR 2012). [PDF]
On Using Nearly-Independent Feature Families for High Precision and Confidence
Omid Madani, Manfred Georg, David Ross. 4th Asian Conference on Machine Learning (ACML 2012). [PDF]
Survey and Evaluation of Audio Fingerprinting Schemes for Mobile Query-by-Example Applications
Vijay Chandrasekhar, Matt Sharifi, David Ross. 12th International Society for Music Information Retrieval Conference (ISMIR 2011). [PDF]
The Power of Comparative Reasoning
Jay Yagnik, Dennis Strelow, David Ross, Ruei-Sung Lin. ICCV 2011. [PDF]
Automatic Language Identification in Music Videos with Low Level Audio and Visual Features
Vijay Chandrasekhar, Mehmet Emre Sargin, and David Ross. ICASSP 2011. [PDF]
SPEC Hashing: Similarity Preserving algorithm for Entropy-based Coding
Ruei-Sung Lin, David Ross, and Jay Yagnik. CVPR 2010. [PDF]
Learning Articulated Structure and Motion
David Ross, Daniel Tarlow, and Richard Zemel. International Journal of Computer Vision, 88 (2), 2010. [PDF] project website
Learning Probabilistic Models for Visual Motion
David Ross, Ph.D. Thesis, University of Toronto, Canada, 2008. [PDF] videos
Unsupervised learning of skeletons from motion
David Ross, Daniel Tarlow, and Richard Zemel. 10th European Conference on Computer Vision (ECCV 2008), 2008. [PDF] project website
Learning stick-figure models using nonparametric Bayesian priors over trees
Edward Meeds, David Ross, Richard Zemel, and Sam Roweis. IEEE Conference on Computer Vision and Pattern Recognition, 2008. [PDF]
Learning Articulated Skeletons From Motion
David Ross, Daniel Tarlow, and Richard Zemel. Workshop on Dynamical Vision at ICCV, 2007. [PDF] project website
Incremental Learning for Robust Visual Tracking
David Ross, Jongwoo Lim, Ruei-Sung Lin, Ming-Hsuan Yang.
In the International Journal of Computer Vision, Special Issue: Learning for Vision, 2008. [PS.GZ] [PDF] project website
Inducing Features from Visual Noise
Andrew Cohen, Richard Shiffrin, Jason Gold, David Ross, and Michael Ross. Journal of Vision, 7(8):15, 2007. [PDF]
Learning Parts-Based Representations of Data
David Ross and Richard Zemel. Journal of Machine Learning Research, 7(Nov):2369-2397, 2006. [PDF] project website
Combining Discriminative Features to Infer Complex Trajectories
David Ross, Simon Osindero, and Richard Zemel. In Proceedings of the Twenty-Third International Conference on Machine Learning, 2006. [PS.GZ] [PDF] project website
Incremental Learning for Visual Tracking
Jongwoo Lim, David Ross, Ruei-Sung Lin, Ming-Hsuan Yang
In L. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, MIT Press, 2005. [PS.GZ] [PDF] project website
Adaptive Discriminative Generative Model and Its Applications
Ruei-Sung Lin, David Ross Jongwoo Lim, Ming-Hsuan Yang
In L. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, MIT Press, 2005. [PS.GZ] [PDF] project website
Adaptive Probabilistic Visual Tracking with Incremental Subspace Update
David Ross, Jongwoo Lim, Ming-Hsuan Yang
In T. Pajdla and J. Matas, editors, Proc. Eighth European Conference on Computer Vision (ECCV 2004), 2004. [PS.GZ] [PDF] project website
Multiple Cause Vector Quantization
David Ross and Richard Zemel
In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, MIT Press, 2003. [PS.GZ] [PDF] project website
Learning Parts-Based Representations of Data (thesis version)
David Ross, University of Toronto, M.Sc. Thesis, 2003. [PS.GZ] [PDF] project website
Bibtex entries for all of the above are available here.

Code

D3D: Distilled 3D Networks TensorFlow code and pre-trained model checkpoints can be found here.
AVA Atomic Visual Actions evaluation code can be found in the ActivityNet Github repo. Find the data here.
The source code for most of my older research projects is available for download here. Included are Matlab implementations of a number of machine learning & computer vision algorithms, but there are also a few other hacks.
Parallel Computing: Here is some code I've written/modified, as well as some getting-started tips for parallel computing using Matlab.
The code for the "Combining Discriminative Features" learning/tracking algorithm is available. cdf_2007-07-13.zip

Other Stuff

Photos of my dog.

David Ross, Ph.D.

Bio

Current Work

Publications

Code

Other Stuff