Bin Yang - University of Toronto

Bin Yang

I'm a PhD candidate in Machine Learning Group at University of Toronto, advised by Prof. Raquel Urtasun. I'm a recipient of Microsoft Research PhD Fellowship (2021) and NVIDIA Pioneer Award (2018). I have worked at Uber ATG, Waabi, MiniMax.

My research aims to advance machine intelligence to augment human interactions with the physical world, focusing on two key areas: (1) end-to-end autonomous system, particularly self-driving vehicles; (2) realistic world simulation, via multimodal generative models.

Email / Google Scholar / LinkedIn

News

1 paper has been accepted by CVPR2023.

3 papers have been accepted in 2021 by ICRA/IROS.

I received 2021 Microsoft Research PhD Fellowship.

Code release for LaneGCN.

10 papers (3 orals, 1 spotlight) have been accepted in 2020 by CVPR/IROS/ECCV/CoRL.

3 papers (1 oral) have been accepted in 2019 by CVPR/ICCV.

Code release for SBNet, example reweight.

6 papers (2 orals, 2 spotlights) have been accepted in 2018 by CVPR/ICML/ECCV/CoRL.

Publications

	Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models Jingfeng Yao, Bin Yang, Xinggang Wang Computer Vision and Pattern Recognition (CVPR), 2025 (Oral) models and codes A new regularization trick to mitigate the bias-variance trade-off in latent diffusion models. We got 1st place on ImageNet 256x256 generation leaderboard.
	Oyster: Towards Unsupervised Object Detection from LiDAR Point Clouds Lunjun Zhang, Anqi Joyce Yang, Yuwen Xiong, Sergio Casas, Bin Yang, Mengye Ren, Raquel Urtasun Computer Vision and Pattern Recognition (CVPR), 2023 OYSTER = Object Discovery via Spatio-Temporal Refinement.
	PLUMENet: Efficient 3D Object Detection from Stereo Images Yan Wang, Bin Yang, Rui Hu, Ming Liang, Raquel Urtasun International Conference on Intelligent Robots and Systems (IROS), 2021 PLUME = pseudo Lidar feature volume. We got 1st place on KITTI BEV detection leaderboard (car, stereo methods without extra training data).
	Diverse Complexity Measures for Dataset Curation in Self-driving Abbas Sadat, Sean Segal, Sergio Casas, James Tu, Bin Yang, Raquel Urtasun, Ersin Yumer International Conference on Intelligent Robots and Systems (IROS), 2021 Automatic selection of interesting self-driving logs.
	Auto4D: Learning to Label 4D Objects from Sequential Point Clouds Bin Yang, Min Bai, Ming Liang, Wenyuan Zeng, Raquel Urtasun Technical Report, 2021 Improve low-quality object tracks with fixed size and smoother motion.
	Perceive, Attend, and Drive: Learning Spatial Attention for Safe Self-Driving Bob Wei, Mengye Ren, Wenyuan Zeng, Ming Liang, Bin Yang, Raquel Urtasun International Conference on Robotics and Automation (ICRA), 2021 Learn where to attend for end-to-end neural motion planner.
	Recovering and Simulating Pedestrians in the Wild Ze Yang, Siva Manivasagam, Ming Liang, Bin Yang, Wei-Chiu Ma, Raquel Urtasun Conference on Robot Learning (CoRL), 2020 (Spotlight) video Pedestrian shape and pose reconstruction from in-the-wild multi-sensor data.
	Learning Lane Graph Representations for Motion Forecasting Ming Liang, Bin Yang, Rui Hu, Yun Chen, Renjie Liao, Song Feng, Raquel Urtasun European Conference on Computer Vision (ECCV), 2020 (Oral) slides / code A new representation for map (lane graph) and a new operator (LaneConv) on it. We got 1st place on Argoverse motion forecasting leaderboard (ADE/FDE metrics).
	V2VNet: Vehicle-to-Vehicle Communication for Joint Perception and Prediction Tsun-Hsuan Wang, Siva Manivasagam, Ming Liang, Bin Yang, Wenyuan Zeng, Raquel Urtasun European Conference on Computer Vision (ECCV), 2020 (Oral) Model vehicle-to-vehicle communication via graph neural network.
	RadarNet: Exploiting Radar for Robust Perception of Dynamic Objects Bin Yang, Runsheng Guo, Ming Liang, Sergio Casas, Raquel Urtasun European Conference on Computer Vision (ECCV), 2020 slides Multi-level fusion of LiDAR & Radar.
	Testing the Safety of Self-driving Vehicles by Simulating Perception and Prediction Kelvin Wong, Qiang Zhang, Ming Liang, Bin Yang, Renjie Liao, Abbas Sadat, Raquel Urtasun European Conference on Computer Vision (ECCV), 2020 PnPSim(scene) = PnPNet(scene, sensor data)
	DSDNet: Deep Structured self-Driving Network Wenyuan Zeng, Shenlong Wang, Renjie Liao, Yun Chen, Bin Yang, Raquel Urtasun European Conference on Computer Vision (ECCV), 2020 Deep structured model for probabilistic multimodal prediction.
	End-to-end Contextual Perception and Prediction with Interaction Transformer Lingyun Luke Li, Bin Yang, Ming Liang, Wenyuan Zeng, Mengye Ren, Sean Segal, Raquel Urtasun International Conference on Intelligent Robots and Systems (IROS), 2020 Adapt Transformer to model multi-agent interactions in trajectory prediction.
	PnPNet: End-to-End Perception and Prediction with Tracking in the Loop Ming Liang, Bin Yang, Wenyuan Zeng, Yun Chen, Rui Hu, Sergio Casas, Raquel Urtasun Computer Vision and Pattern Recognition (CVPR), 2020 slides The first P&P model that solves detect->track->predict end-to-end.
	LiDARsim: Realistic LiDAR Simulation by Leveraging the Real World Siva Manivasagam, Shenlong Wang, Kelvin Wong, Wenyuan Zeng, Mikita Sazanovich, Shuhan Tan, Bin Yang, Wei-Chiu Ma, Raquel Urtasun Computer Vision and Pattern Recognition (CVPR), 2020 (Oral) Realistic sensor simulation of LiDAR for closed-loop evaluation.
	Physically Realizable Adversarial Examples for LiDAR Object Detection James Tu, Mengye Ren, Siva Manivasagam, Ming Liang, Bin Yang, Richard Du, Frank Cheng, Raquel Urtasun Computer Vision and Pattern Recognition (CVPR), 2020 Universal rooftop attack that hides vehicles from LiDAR based object detectors.
	Learning Joint 2D-3D Representations for Depth Completion Yun Chen, Bin Yang, Ming Liang, Raquel Urtasun International Conference on Computer Vision (ICCV), 2019 We propose the 2D-3D fuse block for RGBD data. We got 1st place on KITTI depth completion leaderboard.
	Multi-Task Multi-Sensor Fusion for 3D Object Detection Ming Liang, Bin Yang, Yun Chen, Rui Hu, Raquel Urtasun Computer Vision and Pattern Recognition (CVPR), 2019 Multi-sensor fusion <==> multi-task learning. We got 1st place on KITTI 2D/3D/BEV car detection leaderboard.
	End-to-end Interpretable Neural Motion Planner Wenyuan Zeng, Wenjie Luo, Simon Suo, Abbas Sadat, Bin Yang, Sergio Casas, Raquel Urtasun Computer Vision and Pattern Recognition (CVPR), 2019 (Oral) The first end-to-end neural motion planner with perception and prediction interpretations.
	HDNET: Exploiting HD Maps for 3D Object Detection Bin Yang, Ming Liang, Raquel Urtasun 2nd Conference on Robot Learning (CoRL), 2018 (Spotlight) A LiDAR based 3D detector that exploits geometric and semantic priors from HD maps (built offline or estimated online). We got 1st place on KITTI BEV car detection leaderboard.
	Deep Continuous Fusion for Multi-Sensor 3D Object Detection Ming Liang, Bin Yang, Shenlong Wang, Raquel Urtasun European Conference on Computer Vision (ECCV), 2018 Geometry-aware dense feature fusion for high-performance Camera-LiDAR based 3D object detection. We got 1st place on KITTI BEV car detection leaderboard.
	Learning to Reweight Examples for Robust Deep Learning Mengye Ren, Wenyuan Zeng, Bin Yang, Raquel Urtasun International Conference on Machine Learning (ICML), 2018 (Oral) code Online example weighting algorithm for problems with imbalanced classes or noisy labels.
	PIXOR: Real-time 3D Object Detection From Point Clouds Bin Yang, Wenjie Luo, Raquel Urtasun Computer Vision and Pattern Recognition (CVPR), 2018 FAQ The first state-of-the-art 3D object detector with real-time speed (28 FPS).
	Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net Wenjie Luo, Bin Yang, Raquel Urtasun Computer Vision and Pattern Recognition (CVPR), 2018 (Oral) UofT News Joint detection, prediction and tracking from LiDAR with a single CNN.
	SBNet: Sparse Blocks Network for Fast Inference Mengye Ren, Andrei Pokrovsky, Bin Yang, Raquel Urtasun Computer Vision and Pattern Recognition (CVPR), 2018 (Spotlight)* Uber Engineering Blog / UofT News / NVIDIA Pioneer Award / code Speeding up inference by exploiting sparsity in CNN activations.
	TorontoCity: Seeing the World with a Million Eyes Shenlong Wang, Min Bai, Gellért Máttyus, Hang Chu, Wenjie Luo, Bin Yang, Justin Liang, Joel Cheverie, Sanja Fidler, Raquel Urtasun International Conference on Computer Vision (ICCV), 2017 (Spotlight)* City-scale benchmark dataset (covering the full Greater Toronto Area) that contains data in the form of aerial image, panorama, GoPro, LiDAR, as well as maps with 3D buildings and road information.
	Gated Bi-directional CNN for Object Detection Xingyu Zeng, Wanli Ouyang, Bin Yang, Junjie Yan, Xiaogang Wang European Conference on Computer Vision (ECCV), 2016 project page / code Capturing multi-scale context with bi-directional message passing. Combined with CRAFT, we got 1st place in ILSVRC 2016 Object Detection Task (technical report accepted by TPAMI 2018).
	T-CNN: Tubelets with Convolutional Neural Networks for Object Detection from Videos Kai Kang, Hongsheng Li, Junjie Yan, Xingyu Zeng, Bin Yang, Tong Xiao, Cong Zhang, Zhe Wang, Ruohui Wang, Xiaogang Wang, Wanli Ouyang IEEE Transactions on Circuits and Systems for Video Technology, 2018 slides / code Using CRAFT and DeepID-Net as still-image object detectors, we got 1st place in ILSVRC 2015 Object Detection from Video Task.
	CRAFT Objects from Images Bin Yang, Junjie Yan, Zhen Lei, Stan Z. Li Computer Vision and Pattern Recognition (CVPR), 2016 project page / code Cascade in proposal! Cascade in detection!
	Convolutional Channel Features Bin Yang, Junjie Yan, Zhen Lei, Stan Z. Li International Conference on Computer Vision (ICCV), 2015 project page / video / code Convolutional maps + random forests = one approach for diverse tasks.
	Fine-grained Evaluation on Face Detection in the Wild Bin Yang, Junjie Yan, Zhen Lei, Stan Z. Li International Conference on Automatic Face and Gesture Recognition (FG), 2015 project page AP_i = AP on testing faces with attribute_i (otherwise ignored)
	Adaptive Structural Model for Video Based Pedestrian Detection Junjie Yan, Bin Yang, Zhen Lei, Stan Z. Li Asian Conference on Computer Vision (ACCV), 2014 An approach that adapts image-based pedestrian detector to videos.
	Aggregate Channel Features for Multi-view Face Detection Bin Yang, Junjie Yan, Zhen Lei, Stan Z. Li International Joint Conference on Biometrics (IJCB), 2014 (Oral, Best Student Paper) project page Real-time face detector with state-of-the-art performance on AFW and FDDB. My bachelor's thesis.