Wenyuan Zeng

PhD Student · University of Toronto
Sr Research Scientist · Waabi

Hi, I'm a PhD student at Department of Computer Science, University of Toronto since 2017, advised by Prof. Raquel Urtasun, and a member of Vector Institute. I am also a Sr Research Scientist at Waabi, doing research related to autonomous vehicles.

Prior to join University of Toronto, I obtained my bachelor's degree in Mathematics and Physics from Tsinghua University. During my undergraduate study, I was very fortunate to work with Prof. Zhiyuan Liu and Prof. Sanja Fidler.

I have a broad interest in computer vision, robotics and machine learning. My recent work involves developing interpretable and robust algorithms for autonomous driving, specially focusing on following topics:

End-to-end Neural Motion Planning: NMP , DSDNet , SANMP , DSRP
Joint Perception & Prediction: PnPNet , V2VNet , LaneRCNN , SA-GNN , InteractTransformer
High Fidelity Simulation: LidarSIM , WSCW
Model Acceleration and Efficient Learning: L2Reweight , PQF , NAP

news

1 paper has been accepted by ECCV 2022, coming soon!
1 paper has been accepted by ICCV 2021.
2 papers have been accepted by IROS 2021.
Talk at CVPR 2021: All About Self-Driving Tutorial.
I joined Waabi as a Sr Research Scientist.
1 paper (1 oral) has been accepted by CVPR 2021.
2 papers have been accepted by ICRA 2021.
3 papers (1 oral, 1 spotlight) have been accepted by ECCV 2020.
1 paper has been accepted by IROS 2020.
2 papers (1 oral) have been accepted by CVPR 2020.

publications

(show selected / show all by date)

Miscs

I'm super interested in using / configuring my Neovim . Here's some cool pictures of my customized Neovim.
I enjoy playing / watching many kinds of sports, including but not limited to: basketball, badminton, swimming, climbing etc. My favorite sports player is Kobe Bryant .
I love watching Korean comedy such as Running Man. My favorite comedian is Yoo Jae-suk (유재석). I also like K-pop. My favorite idol is Im Yoon-ah (임윤아).

	LaneRCNN: Distributed Representations for Graph-Centric Motion Forecasting Wenyuan Zeng, Ming Liang, Renjie Liao, Raquel Urtasun International Conference on Intelligent Robots and Systems (IROS), Prague, 2021 Ranked 1st place on Argoverse Motion Forecasting Leaderboard (Nov-16 2020). show abstract / paper Graph-centric motion forecasting model for autonomous driving. Forecasting the future behaviors of dynamic actors is an important task in many robotics applications such as self-driving. It is extremely challenging as actors have latent intentions and their trajectories are governed by complex interactions between the other actors, themselves, and the maps. In this paper, we propose LaneRCNN, a graph-centric motion forecasting model. Importantly, relying on a specially designed graph encoder, we learn a local lane graph representation per actor (LaneRoI) to encode its past motions and the local map topology. We further develop an interaction module which permits efficient message passing among local graph representations within a shared global lane graph. Moreover, we parameterize the output trajectories based on lane graphs, a more amenable prediction parameterization. Our LaneRCNN captures the actor-to-actor and the actor-to-map relations in a distributed and map-aware manner. We demonstrate the effectiveness of our approach on the large-scale Argoverse Motion Forecasting Benchmark. We achieve the 1st place on the leaderboard and significantly outperform previous best results.
	Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks Julieta Martinez, Jashan Shewakramani, Ting Wei Liu, Ioan Andrei Bârsan, Wenyuan Zeng*, Raquel Urtasun Computer Vision and Pattern Recognition (CVPR), 2021. Oral Presentation show abstract / paper Boosting compression ratios with permute, quantize and then fine-tune. Compressing large neural networks is an important step for their deployment in resource-constrained computational platforms. In this context, vector quantization is an appealing framework that expresses multiple parameters using a single code, and has recently achieved state-of-the-art network compression on a range of core vision and natural language processing tasks. Key to the success of vector quantization is deciding which parameter groups should be compressed together. Previous work has relied on heuristics that group the spatial dimension of individual convolutional filters, but a general solution remains unaddressed. This is desirable for pointwise convolutions (which dominate modern architectures), linear layers (which have no notion of spatial dimension), and convolutions (when more than one filter is compressed to the same codeword). In this paper we make the observation that the weights of two adjacent layers can be permuted while expressing the same function. We then establish a connection to rate-distortion theory and search for permutations that result in networks that are easier to compress. Finally, we rely on an annealed quantization algorithm to better compress the network and achieve higher final accuracy. We show results on image classification, object detection, and segmentation, reducing the gap with the uncompressed model by 40 to 70% with respect to the current state of the art.
	DSDNet: Deep Structured self-Driving Network Wenyuan Zeng, Shenlong Wang, Renjie Liao, Yun Chen, Bin Yang, Raquel Urtasun European Conference Computer Vision (ECCV), Glasgow, 2020. show abstract / paper Unified framework for multi-modal social consistent prediction and safe planning under uncertainties. In this paper, we propose the Deep Structured self-Driving Network (DSDNet), which performs object detection, motion prediction, and motion planning with a single neural network. Towards this goal, we develop a deep structured energy based model which considers the interactions between actors and produces socially consistent multimodal future predictions. Furthermore, DSDNet explicitly exploits the predicted future distributions of actors to plan a safe maneuver by using a structured planning cost. Our sample-based formulation allows us to overcome the difficulty in probabilistic inference of continuous random variables. Experiments on a number of large-scale self driving datasets demonstrate that our model significantly outperforms the state-of-the-art.
	V2VNet: Vehicle-to-Vehicle Communication for Joint Perception and Prediction Tsun-Hsuan Wang, Sivabalan Manivasagam, Ming Liang, Bin Yang, Wenyuan Zeng, Raquel Urtasun European Conference Computer Vision (ECCV), Glasgow, 2020. Oral Presentation show abstract / paper Vehicle to vehicle communication network for self-driving. In this paper, we explore the use of vehicle-to-vehicle (V2V) communication to improve the perception and motion forecasting performance of self-driving vehicles. By intelligently aggregating the information received from multiple nearby vehicles, we can observe the same scene from different viewpoints. This allows us to see through occlusions and detect actors at long range, where the observations are very sparse or non-existent. We also show that our approach of sending compressed deep feature map activations achieves high accuracy while satisfying communication bandwidth requirements.
	LidarSIM: Realistic LiDAR Simulation by Leveraging the Real World Sivabalan Manivasagam, Shenlong Wang, Kelvin Wong, Wenyuan Zeng, Mikita Sazanovich, Shuhan Tan, Bin Yang, Wei-Chiu Ma, Raquel Urtasun Computer Vision and Pattern Recognition (CVPR), Seattle, 2020. Oral Presentation show abstract / paper High-fidelity LiDAR simulator for close-loop evaluating autonomous driving systems. We tackle the problem of producing realistic simulations of LiDAR point clouds, the sensor of preference for most self-driving vehicles. We argue that, by leveraging real data, we can simulate the complex world more realistically compared to employing virtual worlds built from CAD/procedural models. Towards this goal, we first build a large catalog of 3D static maps and 3D dynamic objects by driving around several cities with our self-driving fleet. We can then generate scenarios by selecting a scene from our catalog and "virtually" placing the self-driving vehicle (SDV) and a set of dynamic objects from the catalog in plausible locations in the scene. To produce realistic simulations, we develop a novel simulator that captures both the power of physics-based and learning-based simulation. We first utilize ray casting over the 3D scene and then use a deep neural network to produce deviations from the physics-based simulation, producing realistic LiDAR point clouds. We showcase LiDARsim's usefulness for perception algorithms-testing on long-tail events and end-to-end closed-loop evaluation on safety-critical scenarios.
	End-to-end Interpretable Neural Motion Planner Wenyuan Zeng, Wenjie Luo, Simon Suo, Abbas Sadat, Bin Yang, Sergio Casas, Raquel Urtasun Computer Vision and Pattern Recognition (CVPR), Long Beach, 2019. Oral Presentation show abstract / paper / talk End-to-end motion planner with interpretable intermediate representations. In this paper, we propose a neural motion planner for learning to drive autonomously in complex urban scenarios that include traffic-light handling, yielding, and interactions with multiple road-users. Towards this goal, we design a holistic model that takes as input raw LIDAR data and a HD map and produces interpretable intermediate representations in the form of 3D detections and their future trajectories, as well as a cost volume defining the goodness of each position that the self-driving car can take within the planning horizon. We then sample a set of diverse physically possible trajectories and choose the one with the minimum learned cost. Importantly, our cost volume is able to naturally capture multi-modality. We demonstrate the effectiveness of our approach in real-world driving data captured in several cities in North America. Our experiments show that the learned cost volume can generate safer planning than all the baselines.
	Learning to Reweight Examples for Robust Deep Learning Mengye Ren, Wenyuan Zeng, Bin Yang, Raquel Urtasun International Conference on Machine Learning (ICML), Stockholm, 2018. show abstract / paper / talk / code Learning to reweight training examples in an online manner to overcome noisy or class-imblanced datasets. Deep neural networks have been shown to be very powerful modeling tools for many supervised learning tasks involving complex input patterns. However, they can also easily overfit to training set biases and label noises. In addition to various regularizers, example reweighting algorithms are popular solutions to these problems, but they require careful tuning of additional hyperparameters, such as example mining schedules and regularization hyperparameters. In contrast to past reweighting methods, which typically consist of functions of the cost value of each example, in this work we propose a novel meta-learning algorithm that learns to assign weights to training examples based on their gradient directions. To determine the example weights, our method performs a meta gradient descent step on the current mini-batch example weights (which are initialized from zero) to minimize the loss on a clean unbiased validation set. Our proposed method can be easily implemented on any type of deep network, does not require any additional hyperparameter tuning, and achieves impressive performance on class imbalance and corrupted label problems where only a small amount of clean validation data is available.

	Self-Supervised Representation Learning from Flow Equivariance Yuwen Xiong, Mengye Ren, Wenyuan Zeng, Raquel Urtasun International Conference on Computer Vision (ICCV), Montreal, 2021 show abstract / paper Self-supervised learning from video streams of complex scenes. Self-supervised representation learning is able to learn semantically meaningful features; however, much of its recent success relies on multiple crops of an image with very few objects. Instead of learning view-invariant representation from simple images, humans learn representations in a complex world with changing scenes by observing object movement, deformation, pose variation, and ego motion. Motivated by this ability, we present a new self-supervised learning representation framework that can be directly deployed on a video stream of complex scenes with many moving objects. Our framework features a simple flow equivariance objective that encourages the network to predict the features of another frame by applying a flow transformation to the features of the current frame. Our representations, learned from high-resolution raw video, can be readily used for downstream tasks on static images. Readout experiments on challenging semantic segmentation, instance segmentation, and object detection benchmarks show that we are able to outperform representations obtained from previous state-of-the-art methods including SimCLR and BYOL.
	LaneRCNN: Distributed Representations for Graph-Centric Motion Forecasting Wenyuan Zeng, Ming Liang, Renjie Liao, Raquel Urtasun International Conference on Intelligent Robots and Systems (IROS), Prague, 2021 Ranked 1st place on Argoverse Motion Forecasting Leaderboard (Nov-16 2020). show abstract / paper Graph-centric motion forecasting model for autonomous driving. Forecasting the future behaviors of dynamic actors is an important task in many robotics applications such as self-driving. It is extremely challenging as actors have latent intentions and their trajectories are governed by complex interactions between the other actors, themselves, and the maps. In this paper, we propose LaneRCNN, a graph-centric motion forecasting model. Importantly, relying on a specially designed graph encoder, we learn a local lane graph representation per actor (LaneRoI) to encode its past motions and the local map topology. We further develop an interaction module which permits efficient message passing among local graph representations within a shared global lane graph. Moreover, we parameterize the output trajectories based on lane graphs, a more amenable prediction parameterization. Our LaneRCNN captures the actor-to-actor and the actor-to-map relations in a distributed and map-aware manner. We demonstrate the effectiveness of our approach on the large-scale Argoverse Motion Forecasting Benchmark. We achieve the 1st place on the leaderboard and significantly outperform previous best results.
	Safety-Oriented Pedestrian Motion and Scene Occupancy Forecasting Katie Luo, Sergio Casas, Xinchen Yan, Yuwen Xiong, Wenyuan Zeng, Raquel Urtasun International Conference on Intelligent Robots and Systems (IROS), Prague, 2021 show abstract / paper Hybrid instance-based and instance-free approach to pedestrian behavior prediction. In this paper, we address the important problem in self-driving of forecasting multi-pedestrian motion and their shared scene occupancy map, critical for safe navigation. Our contributions are two-fold. First, we advocate for predicting both the individual motions as well as the scene occupancy map in order to effectively deal with missing detections caused by postprocessing, e.g., confidence thresholding and non-maximum suppression. Second, we propose a Scene-Actor Graph Neural Network (SA-GNN) which preserves the relative spatial information of pedestrians via 2D convolution, and captures the interactions among pedestrians within the same scene, including those that have not been detected, via message passing. On two large-scale real-world datasets, nuScenes and ATG4D, we showcase that our scene-occupancy predictions are more accurate and better calibrated than those from state-of-the-art motion forecasting methods, while also matching their performance in pedestrian motion forecasting metrics.
	Auto4D: Learning to Label 4D Objects from Sequential Point Clouds Bin Yang, Min Bai, Ming Liang, Wenyuan Zeng, Raquel Urtasun Technical Report, 2021 show abstract / paper Improve low-quality object tracks with fixed sizes and smoother motions. In the past few years we have seen great advances in object perception (particularly in 4D space-time dimensions) thanks to deep learning methods. However, they typically rely on large amounts of high-quality labels to achieve good performance, which often require time-consuming and expensive work by human annotators. To address this we propose an automatic annotation pipeline that generates accurate object trajectories in 3D space (i.e., 4D labels) from LiDAR point clouds. The key idea is to decompose the 4D object label into two parts: the object size in 3D that's fixed through time for rigid objects, and the motion path describing the evolution of the object's pose through time. Instead of generating a series of labels in one shot, we adopt an iterative refinement process where online generated object detections are tracked through time as the initialization. Given the cheap but noisy input, our model produces higher quality 4D labels by re-estimating the object size and smoothing the motion path, where the improvement is achieved by exploiting aggregated observations and motion cues over the entire trajectory. We validate the proposed method on a large-scale driving dataset and show a 25% reduction of human annotation efforts. We also showcase the benefits of our approach in the annotator-in-the-loop setting.
	Network Automatic Pruning: Start NAP and Take a Nap Wenyuan Zeng, Yuwen Xiong, Raquel Urtasun Technical Report, 2021 show abstract / paper Automatic pruning your network with almost no hyper-parameter tunning. Network pruning can significantly reduce the computation and memory footprint of large neural networks. To achieve a good trade-off between model size and performance, popular pruning techniques usually rely on hand-crafted heuristics and require manually setting the compression ratio for each layer. This process is typically time-consuming and requires expert knowledge to achieve good results. In this paper, we propose NAP, a unified and automatic pruning framework for both fine-grained and structured pruning. It can find out unimportant components of a network and automatically decide appropriate compression ratios for different layers, based on a theoretically sound criterion. Towards this goal, NAP uses an efficient approximation of the Hessian for evaluating the importances of components, based on a Kronecker-factored Approximate Curvature method. Despite its simpleness to use, NAP outperforms previous pruning methods by large margins. For fine-grained pruning, NAP can compress AlexNet and VGG16 by 25x, and ResNet-50 by 6.7x without loss in accuracy on ImageNet. For structured pruning (e.g. channel pruning), it can reduce flops of VGG16 by 5.4x and ResNet-50 by 2.3x with only 1% accuracy drop. More importantly, this method is almost free from hyper-parameter tuning and requires no expert knowledge. You can start NAP and then take a nap!
	Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks Julieta Martinez, Jashan Shewakramani, Ting Wei Liu, Ioan Andrei Bârsan, Wenyuan Zeng*, Raquel Urtasun Computer Vision and Pattern Recognition (CVPR), 2021. Oral Presentation show abstract / paper Boosting compression ratios with permute, quantize and then fine-tune. Compressing large neural networks is an important step for their deployment in resource-constrained computational platforms. In this context, vector quantization is an appealing framework that expresses multiple parameters using a single code, and has recently achieved state-of-the-art network compression on a range of core vision and natural language processing tasks. Key to the success of vector quantization is deciding which parameter groups should be compressed together. Previous work has relied on heuristics that group the spatial dimension of individual convolutional filters, but a general solution remains unaddressed. This is desirable for pointwise convolutions (which dominate modern architectures), linear layers (which have no notion of spatial dimension), and convolutions (when more than one filter is compressed to the same codeword). In this paper we make the observation that the weights of two adjacent layers can be permuted while expressing the same function. We then establish a connection to rate-distortion theory and search for permutations that result in networks that are easier to compress. Finally, we rely on an annealed quantization algorithm to better compress the network and achieve higher final accuracy. We show results on image classification, object detection, and segmentation, reducing the gap with the uncompressed model by 40 to 70% with respect to the current state of the art.
	Perceive, Attend, and Drive: Learning Spatial Attention for Safe Self-Driving Bob Wei, Mengye Ren, Wenyuan Zeng, Ming Liang, Bin Yang, Raquel Urtasun International Conference on Robotics and Automation (ICRA), Xi'an, 2021. show abstract / paper End-to-end neural motion planner with adaptive attention. In this paper, we propose an end-to-end self-driving network featuring a sparse attention module that learns to automatically attend to important regions of the input. The attention module specifically targets motion planning, whereas prior literature only applied attention in perception tasks. Learning an attention mask directly targeted for motion planning significantly improves the planner safety by performing more focused computation. Furthermore, visualizing the attention improves interpretability of end-to-end self-driving.
	Deep Structured Reactive Planning Jerry Liu, Wenyuan Zeng, Raquel Urtasun Ersin Yumer International Conference on Robotics and Automation (ICRA), Xi'an, 2021. show abstract / paper Joint prediction and planning with reactive behaviors. An intelligent agent operating in the real-world must balance achieving its goal with maintaining the safety and comfort of not only itself, but also other participants within the surrounding scene. This requires jointly reasoning about the behavior of other actors while deciding its own actions as these two processes are inherently intertwined - a vehicle will yield to us if we decide to proceed first at the intersection but will proceed first if we decide to yield. However, this is not captured in most self-driving pipelines, where planning follows prediction. In this paper we propose a novel data-driven, reactive planning objective which allows a self-driving vehicle to jointly reason about its own plans as well as how other actors will react to them. We formulate the problem as an energy-based deep structured model that is learned from observational data and encodes both the planning and prediction problems. Through simulations based on both real-world driving and synthetically generated dense traffic, we demonstrate that our reactive model outperforms a non-reactive variant in successfully completing highly complex maneuvers (lane merges/turns in traffic) faster, without trading off collision rate.
	DSDNet: Deep Structured self-Driving Network Wenyuan Zeng, Shenlong Wang, Renjie Liao, Yun Chen, Bin Yang, Raquel Urtasun European Conference Computer Vision (ECCV), Glasgow, 2020. show abstract / paper Unified framework for multi-modal social consistent prediction and safe planning under uncertainties. In this paper, we propose the Deep Structured self-Driving Network (DSDNet), which performs object detection, motion prediction, and motion planning with a single neural network. Towards this goal, we develop a deep structured energy based model which considers the interactions between actors and produces socially consistent multimodal future predictions. Furthermore, DSDNet explicitly exploits the predicted future distributions of actors to plan a safe maneuver by using a structured planning cost. Our sample-based formulation allows us to overcome the difficulty in probabilistic inference of continuous random variables. Experiments on a number of large-scale self driving datasets demonstrate that our model significantly outperforms the state-of-the-art.
	V2VNet: Vehicle-to-Vehicle Communication for Joint Perception and Prediction Tsun-Hsuan Wang, Sivabalan Manivasagam, Ming Liang, Bin Yang, Wenyuan Zeng, Raquel Urtasun European Conference Computer Vision (ECCV), Glasgow, 2020. Oral Presentation show abstract / paper Vehicle to vehicle communication network for self-driving. In this paper, we explore the use of vehicle-to-vehicle (V2V) communication to improve the perception and motion forecasting performance of self-driving vehicles. By intelligently aggregating the information received from multiple nearby vehicles, we can observe the same scene from different viewpoints. This allows us to see through occlusions and detect actors at long range, where the observations are very sparse or non-existent. We also show that our approach of sending compressed deep feature map activations achieves high accuracy while satisfying communication bandwidth requirements.
	Weakly-supervised 3D Shape Completion in the Wild Jiayuan Gu, Wei-Chiu Ma, Sivabalan Manivasagam, Wenyuan Zeng, Zihao Wang, Yuwen Xiong, Hao Su, Raquel Urtasun European Conference Computer Vision (ECCV), Glasgow, 2020. Spotlight Presentation show abstract / paper Learning 3D shape reconstruction from unaligned and noisy partial point clouds. 3D shape completion for real data is important but challenging, since partial point clouds acquired by real-world sensors are usually sparse, noisy and unaligned. Different from previous methods, we address the problem of learning 3D complete shape from unaligned and real-world partial point clouds. To this end, we propose an unsupervised method to estimate both 3D canonical shape and 6-DoF pose for alignment, given multiple partial observations associated with the same instance. The network jointly optimizes canonical shapes and poses with multi-view geometry constraints during training, and can infer the complete shape given a single partial point cloud. Moreover, learned pose estimation can facilitate partial point cloud registration. Experiments on both synthetic and real data show that it is feasible and promising to learn 3D shape completion through large-scale data without shape and pose supervision.
	End-to-end Contextual Perception and Prediction with Interaction Transformer Luke Lingyun Li, Bin Yang, Ming Liang, Wenyuan Zeng, Mengye Ren, Sean Segal, Raquel Urtasun International Conference on Intelligent Robots and Systems (IROS), Las Vegas, 2020. show abstract / paper Adapt Transformer to model multi-agent interactions in trajectory prediction. In this paper, we tackle the problem of detecting objects in 3D and forecasting their future motion in the context of self-driving. Towards this goal, we design a novel approach that explicitly takes into account the interactions between actors. To capture their spatial-temporal dependencies, we propose a recurrent neural network with a novel Transformer architecture, which we call the Interaction Transformer. Importantly, our model can be trained end-to-end, and runs in real-time. We validate our approach on two challenging real-world datasets: ATG4D and nuScenes. We show that our approach can outperform the state-of-the-art on both datasets. In particular, we significantly improve the social compliance between the estimated future trajectories, resulting in far fewer collisions between the predicted actors.
	LidarSIM: Realistic LiDAR Simulation by Leveraging the Real World Sivabalan Manivasagam, Shenlong Wang, Kelvin Wong, Wenyuan Zeng, Mikita Sazanovich, Shuhan Tan, Bin Yang, Wei-Chiu Ma, Raquel Urtasun Computer Vision and Pattern Recognition (CVPR), Seattle, 2020. Oral Presentation show abstract / paper High-fidelity LiDAR simulator for close-loop evaluating autonomous driving systems. We tackle the problem of producing realistic simulations of LiDAR point clouds, the sensor of preference for most self-driving vehicles. We argue that, by leveraging real data, we can simulate the complex world more realistically compared to employing virtual worlds built from CAD/procedural models. Towards this goal, we first build a large catalog of 3D static maps and 3D dynamic objects by driving around several cities with our self-driving fleet. We can then generate scenarios by selecting a scene from our catalog and "virtually" placing the self-driving vehicle (SDV) and a set of dynamic objects from the catalog in plausible locations in the scene. To produce realistic simulations, we develop a novel simulator that captures both the power of physics-based and learning-based simulation. We first utilize ray casting over the 3D scene and then use a deep neural network to produce deviations from the physics-based simulation, producing realistic LiDAR point clouds. We showcase LiDARsim's usefulness for perception algorithms-testing on long-tail events and end-to-end closed-loop evaluation on safety-critical scenarios.
	PnPNet: End-to-End Perception and Prediction with Tracking in the Loop Ming Liang, Bin Yang, Wenyuan Zeng, Yun Chen, Rui Hu, Sergio Casas, Raquel Urtasun Computer Vision and Pattern Recognition (CVPR), Seattle, 2020. show abstract / paper The first single neural network that solves detection->tracking->prediction in an end-to-end manner. We tackle the problem of joint perception and motion forecasting in the context of self-driving vehicles. Towards this goal we propose PnPNet, an end-to-end model that takes as input sequential sensor data, and outputs at each time step object tracks and their future trajectories. The key component is a novel tracking module that generates object tracks online from detections and exploits trajectory level features for motion forecasting. Specifically, the object tracks get updated at each time step by solving both the data association problem and the trajectory estimation problem. Importantly, the whole model is end-to-end trainable and benefits from joint optimization of all tasks. We validate PnPNet on two large-scale driving datasets, and show significant improvements over the state-of-the-art with better occlusion recovery and more accurate future prediction.
	End-to-end Interpretable Neural Motion Planner Wenyuan Zeng, Wenjie Luo, Simon Suo, Abbas Sadat, Bin Yang, Sergio Casas, Raquel Urtasun Computer Vision and Pattern Recognition (CVPR), Long Beach, 2019. Oral Presentation show abstract / paper / talk End-to-end motion planner with interpretable intermediate representations. In this paper, we propose a neural motion planner for learning to drive autonomously in complex urban scenarios that include traffic-light handling, yielding, and interactions with multiple road-users. Towards this goal, we design a holistic model that takes as input raw LIDAR data and a HD map and produces interpretable intermediate representations in the form of 3D detections and their future trajectories, as well as a cost volume defining the goodness of each position that the self-driving car can take within the planning horizon. We then sample a set of diverse physically possible trajectories and choose the one with the minimum learned cost. Importantly, our cost volume is able to naturally capture multi-modality. We demonstrate the effectiveness of our approach in real-world driving data captured in several cities in North America. Our experiments show that the learned cost volume can generate safer planning than all the baselines.
	Learning to Reweight Examples for Robust Deep Learning Mengye Ren, Wenyuan Zeng, Bin Yang, Raquel Urtasun International Conference on Machine Learning (ICML), Stockholm, 2018. show abstract / paper / talk / code Learning to reweight training examples in an online manner to overcome noisy or class-imblanced datasets. Deep neural networks have been shown to be very powerful modeling tools for many supervised learning tasks involving complex input patterns. However, they can also easily overfit to training set biases and label noises. In addition to various regularizers, example reweighting algorithms are popular solutions to these problems, but they require careful tuning of additional hyperparameters, such as example mining schedules and regularization hyperparameters. In contrast to past reweighting methods, which typically consist of functions of the cost value of each example, in this work we propose a novel meta-learning algorithm that learns to assign weights to training examples based on their gradient directions. To determine the example weights, our method performs a meta gradient descent step on the current mini-batch example weights (which are initialized from zero) to minimize the loss on a clean unbiased validation set. Our proposed method can be easily implemented on any type of deep network, does not require any additional hyperparameter tuning, and achieves impressive performance on class imbalance and corrupted label problems where only a small amount of clean validation data is available.
	Differentiable Compositional Kernel Learning for Gaussian Processes Shengyang Sun, Guodong Zhang, Chaoqi Wang, Wenyuan Zeng, Jiaman Li, Roger Grosse International Conference on Machine Learning (ICML), Stockholm, 2018. show abstract / paper / talk / code Neural Kernel Network (NKN), a flexible family of kernels represented by a neural network. The generalization properties of Gaussian processes depend heavily on the choice of kernel, and this choice remains a dark art. We present the Neural Kernel Network (NKN), a flexible family of kernels represented by a neural network. The NKN architecture is based on the composition rules for kernels, so that each unit of the network corresponds to a valid kernel. It can compactly approximate compositional kernel structures such as those used by the Automatic Statistician (Lloyd et al., 2014), but because the architecture is differentiable, it is end-to-end trainable with gradient-based optimization. We show that the NKN is universal for the class of stationary kernels. Empirically we demonstrate pattern discovery and extrapolation abilities of NKN on several tasks that depend crucially on identifying the underlying structure, including time series and texture extrapolation, as well as Bayesian optimization.
	Incorporating Relation Paths in Neural Relation Extraction Wenyuan Zeng, Yankai Lin, Zhiyuan Liu, Maosong Sun Conference on Empirical Methods in Natural Language Processing (EMNLP), Copenhagen, 2017. show abstract / paper / code A path-based neural relation extractor which infers relations using inference chains. Distantly supervised relation extraction has been widely used to find novel relational facts from plain text. To predict the relation between a pair of two target entities, existing methods solely rely on those direct sentences containing both entities. In fact, there are also many sentences containing only one of the target entities, which provide rich and useful information for relation extraction. To address this issue, we build inference chains between two target entities via intermediate entities, and propose a path-based neural relation extraction model to encode the relational semantics from both direct sentences and inference chains. Experimental results on real-world datasets show that, our model can make full use of those sentences containing only one target entity, and achieves significant and consistent improvements on relation extraction as compared with baselines.