Bin Yang  

I'm a PhD candidate at University of Toronto. My advisor is Prof. Raquel Urtasun. I'm also part of Waabi working on self-driving trucks. I'm a recipient of Microsoft Research PhD Fellowship (2021) and NVIDIA Pioneer Award (2018).

I obtained my bachelor's degree from China Agricultural University in 2014. From 2014 to 2016, I was very fortunate to work with Prof. Stan Z. Li, Prof. Zhen Lei, and Dr. Junjie Yan on face recognition and object detection from images and videos.

My general research interest lies in data-driven end-to-end solutions for intelligent agents, with a focus on these topics in the self-driving domain:

  • Efficient 3D Object Detection: PIXOR, SBNet, PLUMENet
  • Multi-sensor Fusion: ContFuse, HDNet, MMF, FuseNet, RadarNet
  • Joint Perception & Prediction: FAF, PnPNet, InteractTransformer, V2VNet
  • End-to-end Neural Motion Planner: NMP, DSDNet, SA-NMP
  • Learning-based Simulation: LiDARSim, PnPSim, LiME

    Email  /  Google Scholar  /  LinkedIn

  • News

  • 2 papers have been accepted by IROS2021.
  • 1 paper has been accepted by ICRA2021.
  • I received 2021 Microsoft Research PhD Fellowship.
  • Code release for LaneGCN.
  • 1 paper (spotlight) has been accepted by CoRL2020.
  • 5 papers (2 orals) have been accepted by ECCV2020.
  • 1 paper has been accepted by IROS2020.
  • 3 papers (1 oral) have been accepted by CVPR2020.
  • 1 paper has been accepted by ICCV2019.
  • 2 papers (1 oral) have been accepted by CVPR2019.
  • Code release for SBNet, example reweight.
  • 6 papers (2 orals, 2 spotlights) have been accepted in 2018 by CVPR/ICML/ECCV/CoRL.

  • Publications

    PLUMENet: Efficient 3D Object Detection from Stereo Images
    Yan Wang, Bin Yang, Rui Hu, Ming Liang, Raquel Urtasun
    International Conference on Intelligent Robots and Systems (IROS), 2021

    PLUME = pseudo Lidar feature volume.

    We got 1st place on KITTI BEV detection leaderboard (car, stereo methods without extra training data).

    Diverse Complexity Measures for Dataset Curation in Self-driving
    Abbas Sadat, Sean Segal, Sergio Casas, James Tu, Bin Yang, Raquel Urtasun, Ersin Yumer
    International Conference on Intelligent Robots and Systems (IROS), 2021

    Automatic selection of interesting self-driving logs.

    Auto4D: Learning to Label 4D Objects from Sequential Point Clouds
    Bin Yang, Min Bai, Ming Liang, Wenyuan Zeng, Raquel Urtasun
    Technical Report, 2021

    Improve low-quality object tracks with fixed size and smoother motion.

    Perceive, Attend, and Drive: Learning Spatial Attention for Safe Self-Driving
    Bob Wei*, Mengye Ren*, Wenyuan Zeng, Ming Liang, Bin Yang, Raquel Urtasun
    International Conference on Robotics and Automation (ICRA), 2021

    Learn where to attend for end-to-end neural motion planner.

    Recovering and Simulating Pedestrians in the Wild
    Ze Yang, Siva Manivasagam, Ming Liang, Bin Yang, Wei-Chiu Ma, Raquel Urtasun
    Conference on Robot Learning (CoRL), 2020 (Spotlight)
    video

    Pedestrian shape and pose reconstruction from in-the-wild multi-sensor data.

    Learning Lane Graph Representations for Motion Forecasting
    Ming Liang, Bin Yang, Rui Hu, Yun Chen, Renjie Liao, Song Feng, Raquel Urtasun
    European Conference on Computer Vision (ECCV), 2020 (Oral)
    slides / code

    A new representation for map (lane graph) and a new operator (LaneConv) on it.

    We got 1st place on Argoverse motion forecasting leaderboard (ADE/FDE metrics).

    V2VNet: Vehicle-to-Vehicle Communication for Joint Perception and Prediction
    Tsun-Hsuan Wang, Siva Manivasagam, Ming Liang, Bin Yang, Wenyuan Zeng, Raquel Urtasun
    European Conference on Computer Vision (ECCV), 2020 (Oral)

    Model vehicle-to-vehicle communication via graph neural network.

    RadarNet: Exploiting Radar for Robust Perception of Dynamic Objects
    Bin Yang*, Runsheng Guo*, Ming Liang, Sergio Casas, Raquel Urtasun
    European Conference on Computer Vision (ECCV), 2020
    slides

    Multi-level fusion of LiDAR & Radar.

    Testing the Safety of Self-driving Vehicles by Simulating Perception and Prediction
    Kelvin Wong*, Qiang Zhang*, Ming Liang, Bin Yang, Renjie Liao, Abbas Sadat, Raquel Urtasun
    European Conference on Computer Vision (ECCV), 2020

    PnPSim(scene) = PnPNet(scene, sensor data)

    DSDNet: Deep Structured self-Driving Network
    Wenyuan Zeng, Shenlong Wang, Renjie Liao, Yun Chen, Bin Yang, Raquel Urtasun
    European Conference on Computer Vision (ECCV), 2020

    Deep structured model for probabilistic multimodal prediction.

    End-to-end Contextual Perception and Prediction with Interaction Transformer
    Lingyun Luke Li, Bin Yang, Ming Liang, Wenyuan Zeng, Mengye Ren, Sean Segal, Raquel Urtasun
    International Conference on Intelligent Robots and Systems (IROS), 2020

    Adapt Transformer to model multi-agent interactions in trajectory prediction.

    PnPNet: End-to-End Perception and Prediction with Tracking in the Loop
    Ming Liang*, Bin Yang*, Wenyuan Zeng, Yun Chen, Rui Hu, Sergio Casas, Raquel Urtasun
    Computer Vision and Pattern Recognition (CVPR), 2020
    slides

    The first P&P model that solves detect->track->predict end-to-end.

    LiDARsim: Realistic LiDAR Simulation by Leveraging the Real World
    Siva Manivasagam, Shenlong Wang, Kelvin Wong, Wenyuan Zeng, Mikita Sazanovich, Shuhan Tan, Bin Yang, Wei-Chiu Ma, Raquel Urtasun
    Computer Vision and Pattern Recognition (CVPR), 2020 (Oral)

    Realistic sensor simulation of LiDAR for closed-loop evaluation.

    Physically Realizable Adversarial Examples for LiDAR Object Detection
    James Tu, Mengye Ren, Siva Manivasagam, Ming Liang, Bin Yang, Richard Du, Frank Cheng, Raquel Urtasun
    Computer Vision and Pattern Recognition (CVPR), 2020

    Universal rooftop attack that hides vehicles from LiDAR based object detectors.

    Learning Joint 2D-3D Representations for Depth Completion
    Yun Chen, Bin Yang, Ming Liang, Raquel Urtasun
    International Conference on Computer Vision (ICCV), 2019

    We propose the 2D-3D fuse block for RGBD data.

    We got 1st place on KITTI depth completion leaderboard.

    Multi-Task Multi-Sensor Fusion for 3D Object Detection
    Ming Liang*, Bin Yang*, Yun Chen, Rui Hu, Raquel Urtasun
    Computer Vision and Pattern Recognition (CVPR), 2019

    Multi-sensor fusion <==> multi-task learning.

    We got 1st place on KITTI 2D/3D/BEV car detection leaderboard.

    End-to-end Interpretable Neural Motion Planner
    Wenyuan Zeng*, Wenjie Luo*, Simon Suo, Abbas Sadat, Bin Yang, Sergio Casas, Raquel Urtasun
    Computer Vision and Pattern Recognition (CVPR), 2019 (Oral)

    The first end-to-end neural motion planner with perception and prediction interpretations.

    HDNET: Exploiting HD Maps for 3D Object Detection
    Bin Yang, Ming Liang, Raquel Urtasun
    2nd Conference on Robot Learning (CoRL), 2018 (Spotlight)

    A LiDAR based 3D detector that exploits geometric and semantic priors from HD maps (built offline or estimated online).

    We got 1st place on KITTI BEV car detection leaderboard.

    Deep Continuous Fusion for Multi-Sensor 3D Object Detection
    Ming Liang, Bin Yang, Shenlong Wang, Raquel Urtasun
    European Conference on Computer Vision (ECCV), 2018

    Geometry-aware dense feature fusion for high-performance Camera-LiDAR based 3D object detection.

    We got 1st place on KITTI BEV car detection leaderboard.

    Learning to Reweight Examples for Robust Deep Learning
    Mengye Ren, Wenyuan Zeng, Bin Yang, Raquel Urtasun
    International Conference on Machine Learning (ICML), 2018 (Oral)
    code

    Online example weighting algorithm for problems with imbalanced classes or noisy labels.

    PIXOR: Real-time 3D Object Detection From Point Clouds
    Bin Yang, Wenjie Luo, Raquel Urtasun
    Computer Vision and Pattern Recognition (CVPR), 2018
    FAQ

    The first state-of-the-art 3D object detector with real-time speed (28 FPS).

    Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net
    Wenjie Luo, Bin Yang, Raquel Urtasun
    Computer Vision and Pattern Recognition (CVPR), 2018 (Oral)
    UofT News

    Joint detection, prediction and tracking from LiDAR with a single CNN.

    SBNet: Sparse Blocks Network for Fast Inference
    Mengye Ren*, Andrei Pokrovsky*, Bin Yang*, Raquel Urtasun
    Computer Vision and Pattern Recognition (CVPR), 2018 (Spotlight)
    Uber Engineering Blog / UofT News / NVIDIA Pioneer Award / code

    Speeding up inference by exploiting sparsity in CNN activations.

    TorontoCity: Seeing the World with a Million Eyes
    Shenlong Wang, Min Bai*, Gellért Máttyus*, Hang Chu*, Wenjie Luo, Bin Yang, Justin Liang, Joel Cheverie, Sanja Fidler, Raquel Urtasun
    International Conference on Computer Vision (ICCV), 2017 (Spotlight)

    City-scale benchmark dataset (covering the full Greater Toronto Area) that contains data in the form of aerial image, panorama, GoPro, LiDAR, as well as maps with 3D buildings and road information.

    Gated Bi-directional CNN for Object Detection
    Xingyu Zeng, Wanli Ouyang, Bin Yang, Junjie Yan, Xiaogang Wang
    European Conference on Computer Vision (ECCV), 2016
    project page / code

    Capturing multi-scale context with bi-directional message passing.

    Combined with CRAFT, we got 1st place in ILSVRC 2016 Object Detection Task (technical report accepted by TPAMI 2018).

    T-CNN: Tubelets with Convolutional Neural Networks for Object Detection from Videos
    Kai Kang, Hongsheng Li, Junjie Yan, Xingyu Zeng, Bin Yang, Tong Xiao, Cong Zhang, Zhe Wang, Ruohui Wang, Xiaogang Wang, Wanli Ouyang
    IEEE Transactions on Circuits and Systems for Video Technology, 2018
    slides / code

    Using CRAFT and DeepID-Net as still-image object detectors, we got 1st place in ILSVRC 2015 Object Detection from Video Task.

    CRAFT Objects from Images
    Bin Yang, Junjie Yan, Zhen Lei, Stan Z. Li
    Computer Vision and Pattern Recognition (CVPR), 2016
    project page / code

    Cascade in proposal! Cascade in detection!

    Convolutional Channel Features
    Bin Yang, Junjie Yan, Zhen Lei, Stan Z. Li
    International Conference on Computer Vision (ICCV), 2015
    project page / video / code

    Convolutional maps + random forests = one approach for diverse tasks.

    Fine-grained Evaluation on Face Detection in the Wild
    Bin Yang*, Junjie Yan*, Zhen Lei, Stan Z. Li
    International Conference on Automatic Face and Gesture Recognition (FG), 2015
    project page

    AP_i = AP on testing faces with attribute_i (otherwise ignored)

    Adaptive Structural Model for Video Based Pedestrian Detection
    Junjie Yan, Bin Yang, Zhen Lei, Stan Z. Li
    Asian Conference on Computer Vision (ACCV), 2014

    An approach that adapts image-based pedestrian detector to videos.

    Aggregate Channel Features for Multi-view Face Detection
    Bin Yang, Junjie Yan, Zhen Lei, Stan Z. Li
    International Joint Conference on Biometrics (IJCB), 2014 (Oral, Best Student Paper)
    project page

    Real-time face detector with state-of-the-art performance on AFW and FDDB.
    My bachelor's thesis.

    Teaching Assistant

    Winter 2017, CSC411: Machine Learning and Data Mining
    Fall 2016, CSC420: Introduction to Image Understanding
        -  object detection tutorial [slides]

    Talks

    Winter 2017, CSC2541: Topics in Machine Learning - Sport Analytics
        -  intro to convnets [slides][demo codes]
        -  intro to object detection [slides]

    Winter 2018, CSC2548: Machine Learning in Computer Vision
        -  intro to object detection [slides]


    Website template comes from Jon Barron.