Bio
I lead the Visual Dynamics research group, at Google Research. Our goal is to discover new ways to understand video, with an emphasis on objects, motion, and actions. The team's work includes the Tensorflow Object Detection API, as well as models that help power personal video understanding in Google Photos and Cloud Video Intelligence. We publish research at top academic conferences including CVPR, and organize the AVA Challenge to advance state-of-the-art spatiotemporal action recognition in video.
Previously I led the YouTube Mix team that built the personalized algorithmic radio feature at the heart of YouTube Music.
I obtained my Ph.D. in Machine Learning and Computer vision from the University of Toronto, Canada.
Current Work
- Some of the latest open source releases from my team: TF Object Detection API for Tensorflow 2.x, TF3D for 3D Scene Understanding, and the AIST++ Human Motion dataset.
- The results of the 3rd AVA Action Detection challenge are now available. This event was held at CVPR 2020, in partnership with the International Challenge on Activity Recognition (ActivityNet) workshop.
- My talk Context & Attention for Detecting Objects and Actions in Video at the CVPR'20 LSHVU Tutorial is available on YouTube.
- Our work on Capturing Special Video Moments with Google Photos was just featured on the Google AI BLog.
- I manage a research group in Perception, part of Google AI Research.
Publications
- A complete list of my publications and patents at Google Scholar Citations.
- Distribution Aware Metrics for Conditional Natural
Language Generation.
David M Chan, Yiming Ni, Austin Myers, Sudheendra Vijayanarasimhan, David A Ross, and John Canny. arXiv preprint, 2022. [arXiv] - im2nerf: Image to Neural Radiance Field in the
Wild.
Lu Mi, Abhijit Kundu, David Ross, Frank Dellaert, Noah Snavely, and Alireza Fathi. arXiv preprint, 2022. [arXiv] - What’s in a Caption? Dataset-Specific Linguistic Diversity
and Its Effect on Visual Description Models and
Metrics.
David M. Chan, Austin Myers, Sudheendra Vijayanarasimhan, David A. Ross, Bryan Seybold, John F. Canny. The 1st Workshop on Vision Datasets Understanding, at CVPR 2022. [arXiv] - Optical Mouse: 3D Mouse Pose From Single-View
Video.
Bo Hu, Bryan Seybold, Shan Yang, David Ross, Avneesh Sud, Graham Ruby, and Yi Liu. CV4Animals: Computer Vision for Animal Behavior Tracking and Modeling Workshop, at CVPR 2021. [arXiv] - AI Choreographer Music Conditioned 3D Dance Generation with
AIST++
Ruilong Li, Shan Yang, David A. Ross, Angjoo Kanazawa. ICCV 2021. [arXiv] project website, dataset - Learning Video Representations from Textual Web
Supervision
Jonathan C. Stroud, David A. Ross, Chen Sun, Jia Deng, Rahul Sukthankar, Cordelia Schmid. arXiv 2020. [arXiv] - Active Learning for Video Description With
Cluster-Regularized Ensemble Ranking
David Chan, Sudheendra Vijayanarasimhan, David Ross, John Canny. ACCV 2020. [arXiv], [PDF] supplementary - An LSTM Approach to Temporal 3D Object Detection in LiDAR
Point Clouds
Rui Huang, Wanyue Zhang, Tom Funkhouser, Abhijit Kundu, David Ross, Caroline Pantofaru, Alireza Fathi. ECCV 2020. [arXiv] - Pillar-based Object Detection for Autonomous
Driving
Yue Wang, Abhijit Kundu, Alireza Fathi, Caroline Pantofaru, David Ross, Justin Solomon, Tom Funkhouser. ECCV 2020. [arXiv] - Virtual Multi-view Fusion for 3D Semantic
Segmentation
Abhijit Kundu, Xiaoqi (Michael) Yin, Alireza Fathi, Brew Barrington, David Ross, Tom Funkhouser, Caroline Pantofaru. ECCV 2020. [arXiv] - The AVA-Kinetics Localized Human Actions Video
Dataset
Ang Li, Meghana Thotakuri, David A. Ross, João Carreira, Alexander Vostrikov, Andrew Zisserman. arXiv 2020. [arXiv] project website - DOPS: Learning to Detect 3D Objects and Predict their 3D
Shapes
Mahyar Najibi, Guangda Lai, Abhijit Kundu, Zhichao Lu, Vivek Rathod, Thomas Funkhouser, Caroline Pantofaru, David Ross, Larry S. Davis, Alireza Fathi. CVPR 2020. [arXiv] - Speech2Action: Cross-modal Supervision for Action
Recognition
Arsha Nagrani, Chen Sun, David Ross, Rahul Sukthankar, Cordelia Schmid, Andrew Zisserman. CVPR 2020. [PDF] [arXiv] project page, data - D3D: Distilled 3D Networks for Video Action
Recognition
Jonathan C. Stroud, David A. Ross, Chen Sun, Jia Deng, Rahul Sukthankar. WACV 2020. 2020. [arXiv] code and pre-trained models - Rethinking the Faster R-CNN Architecture for Temporal
Action Localization
Yu-Wei Chao, Sudheendra Vijayanarasimhan, Bryan Seybold, David A Ross, Jia Deng, Rahul Sukthankar. CVPR 2018. [arXiv] Google AI blog - AVA: A Video Dataset of Spatio-temporally Localized Atomic
Visual Actions
Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, Jitendra Malik. CVPR 2018. [arXiv] project website, Google AI blog - On using nearly-independent feature families for high
precision and confidence
Omid Madani, Manfred Georg, David Ross. Machine Learning Journal, 2013. [PDF] - The Intervalgram: An audio feature for large-scale melody
recognition
Thomas C. Walters, David Ross, Richard F. Lyon. 9th International Symposium on Computer Music Modeling and Retrieval (CMMR 2012). [PDF] - On Using Nearly-Independent Feature Families for High
Precision and Confidence
Omid Madani, Manfred Georg, David Ross. 4th Asian Conference on Machine Learning (ACML 2012). [PDF] - Survey and Evaluation of Audio Fingerprinting Schemes for Mobile
Query-by-Example Applications
Vijay Chandrasekhar, Matt Sharifi, David Ross. 12th International Society for Music Information Retrieval Conference (ISMIR 2011). [PDF] - The Power of Comparative
Reasoning
Jay Yagnik, Dennis Strelow, David Ross, Ruei-Sung Lin. ICCV 2011. [PDF] - Automatic Language Identification in Music Videos with Low
Level Audio and Visual Features
Vijay Chandrasekhar, Mehmet Emre Sargin, and David Ross. ICASSP 2011. [PDF] - SPEC Hashing: Similarity Preserving algorithm for
Entropy-based Coding
Ruei-Sung Lin, David Ross, and Jay Yagnik. CVPR 2010. [PDF] - Learning Articulated Structure and Motion
David Ross, Daniel Tarlow, and Richard Zemel. International Journal of Computer Vision, 88 (2), 2010. [PDF] project website - Learning Probabilistic Models for Visual Motion
David Ross, Ph.D. Thesis, University of Toronto, Canada, 2008. [PDF] videos - Unsupervised learning of skeletons from motion
David Ross, Daniel Tarlow, and Richard Zemel. 10th European Conference on Computer Vision (ECCV 2008), 2008. [PDF] project website - Learning stick-figure models using nonparametric Bayesian
priors over
trees
Edward Meeds, David Ross, Richard Zemel, and Sam Roweis. IEEE Conference on Computer Vision and Pattern Recognition, 2008. [PDF] - Learning Articulated Skeletons From Motion
David Ross, Daniel Tarlow, and Richard Zemel. Workshop on Dynamical Vision at ICCV, 2007. [PDF] project website - Incremental Learning for Robust Visual Tracking
David Ross, Jongwoo Lim, Ruei-Sung Lin, Ming-Hsuan Yang.
In the International Journal of Computer Vision, Special Issue: Learning for Vision, 2008. [PS.GZ] [PDF] project website - Inducing Features from Visual Noise
Andrew Cohen, Richard Shiffrin, Jason Gold, David Ross, and Michael Ross. Journal of Vision, 7(8):15, 2007. [PDF] - Learning Parts-Based Representations of Data
David Ross and Richard Zemel. Journal of Machine Learning Research, 7(Nov):2369-2397, 2006. [PDF] project website - Combining Discriminative Features to Infer Complex
Trajectories
David Ross, Simon Osindero, and Richard Zemel. In Proceedings of the Twenty-Third International Conference on Machine Learning, 2006. [PS.GZ] [PDF] project website - Incremental Learning for Visual Tracking
Jongwoo Lim, David Ross, Ruei-Sung Lin, Ming-Hsuan Yang
In L. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, MIT Press, 2005. [PS.GZ] [PDF] project website - Adaptive Discriminative Generative Model and Its
Applications
Ruei-Sung Lin, David Ross Jongwoo Lim, Ming-Hsuan Yang
In L. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, MIT Press, 2005. [PS.GZ] [PDF] project website - Adaptive Probabilistic Visual Tracking with Incremental
Subspace Update
David Ross, Jongwoo Lim, Ming-Hsuan Yang
In T. Pajdla and J. Matas, editors, Proc. Eighth European Conference on Computer Vision (ECCV 2004), 2004. [PS.GZ] [PDF] project website - Multiple Cause Vector Quantization
David Ross and Richard Zemel
In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, MIT Press, 2003. [PS.GZ] [PDF] project website - Learning Parts-Based Representations of Data (thesis
version)
David Ross, University of Toronto, M.Sc. Thesis, 2003. [PS.GZ] [PDF] project website - Bibtex entries for all of the above are available here.
Code
- D3D: Distilled 3D Networks TensorFlow code and pre-trained model checkpoints can be found here.
- AVA Atomic Visual Actions evaluation code can be found in the ActivityNet Github repo. Find the data here.
- The source code for most of my older research projects is available for download here. Included are Matlab implementations of a number of machine learning & computer vision algorithms, but there are also a few other hacks.
- Parallel Computing: Here is some code I've written/modified, as well as some getting-started tips for parallel computing using Matlab.
- The code for the "Combining Discriminative Features" learning/tracking algorithm is available. cdf_2007-07-13.zip