Bin Yang
I'm a PhD candidate at University of Toronto. My advisor is Prof. Raquel Urtasun. I'm also part of Waabi working on self-driving trucks. I'm a recipient of Microsoft Research PhD Fellowship (2021) and NVIDIA Pioneer Award (2018).
I obtained my bachelor's degree from China Agricultural University in 2014. From 2014 to 2016, I was very fortunate to work with Prof. Stan Z. Li, Prof. Zhen Lei, and Dr. Junjie Yan on face recognition and object detection from images and videos.
My general research interest lies in data-driven end-to-end solutions for intelligent agents, with a focus on these topics in the self-driving domain:
Efficient 3D Object Detection: PIXOR, SBNet, PLUMENet
Multi-sensor Fusion: ContFuse, HDNet, MMF, FuseNet, RadarNet
Joint Perception & Prediction: FAF, PnPNet, InteractTransformer, V2VNet
End-to-end Neural Motion Planner: NMP, DSDNet, SA-NMP
Learning-based Simulation: LiDARSim, PnPSim, LiME
Email /
Google Scholar /
LinkedIn
|
|
News
2 papers have been accepted by IROS2021.
1 paper has been accepted by ICRA2021.
I received 2021 Microsoft Research PhD Fellowship.
Code release for LaneGCN.
1 paper (spotlight) has been accepted by CoRL2020.
5 papers (2 orals) have been accepted by ECCV2020.
1 paper has been accepted by IROS2020.
3 papers (1 oral) have been accepted by CVPR2020.
1 paper has been accepted by ICCV2019.
2 papers (1 oral) have been accepted by CVPR2019.
Code release for SBNet, example reweight.
6 papers (2 orals, 2 spotlights) have been accepted in 2018 by CVPR/ICML/ECCV/CoRL.
|
|
PLUMENet: Efficient 3D Object Detection from Stereo Images
Yan Wang, Bin Yang, Rui Hu, Ming Liang, Raquel Urtasun
International Conference on Intelligent Robots and Systems (IROS), 2021
PLUME = pseudo Lidar feature volume.
We got 1st place on KITTI BEV detection leaderboard (car, stereo methods without extra training data).
|
|
Diverse Complexity Measures for Dataset Curation in Self-driving
Abbas Sadat, Sean Segal, Sergio Casas, James Tu, Bin Yang, Raquel Urtasun, Ersin Yumer
International Conference on Intelligent Robots and Systems (IROS), 2021
Automatic selection of interesting self-driving logs.
|
|
Auto4D: Learning to Label 4D Objects from Sequential Point Clouds
Bin Yang, Min Bai, Ming Liang, Wenyuan Zeng, Raquel Urtasun
Technical Report, 2021
Improve low-quality object tracks with fixed size and smoother motion.
|
|
Perceive, Attend, and Drive: Learning Spatial Attention for Safe Self-Driving
Bob Wei*, Mengye Ren*, Wenyuan Zeng, Ming Liang, Bin Yang, Raquel Urtasun
International Conference on Robotics and Automation (ICRA), 2021
Learn where to attend for end-to-end neural motion planner.
|
|
Recovering and Simulating Pedestrians in the Wild
Ze Yang, Siva Manivasagam, Ming Liang, Bin Yang, Wei-Chiu Ma, Raquel Urtasun
Conference on Robot Learning (CoRL), 2020 (Spotlight)
video
Pedestrian shape and pose reconstruction from in-the-wild multi-sensor data.
|
|
Learning Lane Graph Representations for Motion Forecasting
Ming Liang, Bin Yang, Rui Hu, Yun Chen, Renjie Liao, Song Feng, Raquel Urtasun
European Conference on Computer Vision (ECCV), 2020 (Oral)
slides / code
A new representation for map (lane graph) and a new operator (LaneConv) on it.
We got 1st place on Argoverse motion forecasting leaderboard (ADE/FDE metrics).
|
|
V2VNet: Vehicle-to-Vehicle Communication for Joint Perception and Prediction
Tsun-Hsuan Wang, Siva Manivasagam, Ming Liang, Bin Yang, Wenyuan Zeng, Raquel Urtasun
European Conference on Computer Vision (ECCV), 2020 (Oral)
Model vehicle-to-vehicle communication via graph neural network.
|
|
RadarNet: Exploiting Radar for Robust Perception of Dynamic Objects
Bin Yang*, Runsheng Guo*, Ming Liang, Sergio Casas, Raquel Urtasun
European Conference on Computer Vision (ECCV), 2020
slides
Multi-level fusion of LiDAR & Radar.
|
|
Testing the Safety of Self-driving Vehicles by Simulating Perception and Prediction
Kelvin Wong*, Qiang Zhang*, Ming Liang, Bin Yang, Renjie Liao, Abbas Sadat, Raquel Urtasun
European Conference on Computer Vision (ECCV), 2020
PnPSim(scene) = PnPNet(scene, sensor data)
|
|
DSDNet: Deep Structured self-Driving Network
Wenyuan Zeng, Shenlong Wang, Renjie Liao, Yun Chen, Bin Yang, Raquel Urtasun
European Conference on Computer Vision (ECCV), 2020
Deep structured model for probabilistic multimodal prediction.
|
|
End-to-end Contextual Perception and Prediction with Interaction Transformer
Lingyun Luke Li, Bin Yang, Ming Liang, Wenyuan Zeng, Mengye Ren, Sean Segal, Raquel Urtasun
International Conference on Intelligent Robots and Systems (IROS), 2020
Adapt Transformer to model multi-agent interactions in trajectory prediction.
|
|
PnPNet: End-to-End Perception and Prediction with Tracking in the Loop
Ming Liang*, Bin Yang*, Wenyuan Zeng, Yun Chen, Rui Hu, Sergio Casas, Raquel Urtasun
Computer Vision and Pattern Recognition (CVPR), 2020
slides
The first P&P model that solves detect->track->predict end-to-end.
|
|
LiDARsim: Realistic LiDAR Simulation by Leveraging the Real World
Siva Manivasagam, Shenlong Wang, Kelvin Wong, Wenyuan Zeng, Mikita Sazanovich, Shuhan Tan, Bin Yang, Wei-Chiu Ma, Raquel Urtasun
Computer Vision and Pattern Recognition (CVPR), 2020 (Oral)
Realistic sensor simulation of LiDAR for closed-loop evaluation.
|
|
Physically Realizable Adversarial Examples for LiDAR Object Detection
James Tu, Mengye Ren, Siva Manivasagam, Ming Liang, Bin Yang, Richard Du, Frank Cheng, Raquel Urtasun
Computer Vision and Pattern Recognition (CVPR), 2020
Universal rooftop attack that hides vehicles from LiDAR based object detectors.
|
|
Learning Joint 2D-3D Representations for Depth Completion
Yun Chen, Bin Yang, Ming Liang, Raquel Urtasun
International Conference on Computer Vision (ICCV), 2019
We propose the 2D-3D fuse block for RGBD data.
We got 1st place on KITTI depth completion leaderboard.
|
|
Multi-Task Multi-Sensor Fusion for 3D Object Detection
Ming Liang*, Bin Yang*, Yun Chen, Rui Hu, Raquel Urtasun
Computer Vision and Pattern Recognition (CVPR), 2019
Multi-sensor fusion <==> multi-task learning.
We got 1st place on KITTI 2D/3D/BEV car detection leaderboard.
|
|
End-to-end Interpretable Neural Motion Planner
Wenyuan Zeng*, Wenjie Luo*, Simon Suo, Abbas Sadat, Bin Yang, Sergio Casas, Raquel Urtasun
Computer Vision and Pattern Recognition (CVPR), 2019 (Oral)
The first end-to-end neural motion planner with perception and prediction interpretations.
|
|
HDNET: Exploiting HD Maps for 3D Object Detection
Bin Yang, Ming Liang, Raquel Urtasun
2nd Conference on Robot Learning (CoRL), 2018 (Spotlight)
A LiDAR based 3D detector that exploits geometric and semantic priors from HD maps (built offline or estimated online).
We got 1st place on KITTI BEV car detection leaderboard.
|
|
Deep Continuous Fusion for Multi-Sensor 3D Object Detection
Ming Liang, Bin Yang, Shenlong Wang, Raquel Urtasun
European Conference on Computer Vision (ECCV), 2018
Geometry-aware dense feature fusion for high-performance Camera-LiDAR based 3D object detection.
We got 1st place on KITTI BEV car detection leaderboard.
|
|
Learning to Reweight Examples for Robust Deep Learning
Mengye Ren, Wenyuan Zeng, Bin Yang, Raquel Urtasun
International Conference on Machine Learning (ICML), 2018 (Oral)
code
Online example weighting algorithm for problems with imbalanced classes or noisy labels.
|
|
PIXOR: Real-time 3D Object Detection From Point Clouds
Bin Yang, Wenjie Luo, Raquel Urtasun
Computer Vision and Pattern Recognition (CVPR), 2018
FAQ
The first state-of-the-art 3D object detector with real-time speed (28 FPS).
|
|
Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net
Wenjie Luo, Bin Yang, Raquel Urtasun
Computer Vision and Pattern Recognition (CVPR), 2018 (Oral)
UofT News
Joint detection, prediction and tracking from LiDAR with a single CNN.
|
|
SBNet: Sparse Blocks Network for Fast Inference
Mengye Ren*, Andrei Pokrovsky*, Bin Yang*, Raquel Urtasun
Computer Vision and Pattern Recognition (CVPR), 2018 (Spotlight)
Uber Engineering Blog /
UofT News /
NVIDIA Pioneer Award /
code
Speeding up inference by exploiting sparsity in CNN activations.
|
|
TorontoCity: Seeing the World with a Million Eyes
Shenlong Wang, Min Bai*, Gellért Máttyus*, Hang Chu*, Wenjie Luo, Bin Yang, Justin Liang, Joel Cheverie, Sanja Fidler, Raquel Urtasun
International Conference on Computer Vision (ICCV), 2017 (Spotlight)
City-scale benchmark dataset (covering the full Greater Toronto Area) that contains data in the form of aerial image, panorama, GoPro, LiDAR, as well as maps with 3D buildings and road information.
|
|
Gated Bi-directional CNN for Object Detection
Xingyu Zeng, Wanli Ouyang, Bin Yang, Junjie Yan, Xiaogang Wang
European Conference on Computer Vision (ECCV), 2016
project page /
code
Capturing multi-scale context with bi-directional message passing.
Combined with CRAFT, we got 1st place in ILSVRC 2016 Object Detection Task (technical report accepted by TPAMI 2018).
|
|
T-CNN: Tubelets with Convolutional Neural Networks for Object Detection from Videos
Kai Kang, Hongsheng Li, Junjie Yan, Xingyu Zeng, Bin Yang, Tong Xiao, Cong Zhang, Zhe Wang, Ruohui Wang, Xiaogang Wang, Wanli Ouyang
IEEE Transactions on Circuits and Systems for Video Technology, 2018
slides /
code
Using CRAFT and DeepID-Net as still-image object detectors, we got 1st place in ILSVRC 2015 Object Detection from Video Task.
|
|
CRAFT Objects from Images
Bin Yang, Junjie Yan, Zhen Lei, Stan Z. Li
Computer Vision and Pattern Recognition (CVPR), 2016
project page /
code
Cascade in proposal! Cascade in detection!
|
|
Convolutional Channel Features
Bin Yang, Junjie Yan, Zhen Lei, Stan Z. Li
International Conference on Computer Vision (ICCV), 2015
project page /
video /
code
Convolutional maps + random forests = one approach for diverse tasks.
|
|
Fine-grained Evaluation on Face Detection in the Wild
Bin Yang*, Junjie Yan*, Zhen Lei, Stan Z. Li
International Conference on Automatic Face and Gesture Recognition (FG), 2015
project page
AP_i = AP on testing faces with attribute_i (otherwise ignored)
|
|
Adaptive Structural Model for Video Based Pedestrian Detection
Junjie Yan, Bin Yang, Zhen Lei, Stan Z. Li
Asian Conference on Computer Vision (ACCV), 2014
An approach that adapts image-based pedestrian detector to videos.
|
|
Aggregate Channel Features for Multi-view Face Detection
Bin Yang, Junjie Yan, Zhen Lei, Stan Z. Li
International Joint Conference on Biometrics (IJCB), 2014 (Oral, Best Student Paper)
project page
Real-time face detector with state-of-the-art performance on AFW and FDDB. My bachelor's thesis.
|
|