Sergio Casas

Sergio Casas

Research Scientist @ Uber ATG

Ph.D. Student @ University of Toronto

About me

I’m a Research Scientist at Uber ATG’s R&D team. Here, I apply my own research to the development of self-driving vehicle technology, focusing on autonomy algorithms ranging from perception to motion planning.

I am also a PhD student at the University of Toronto, and a member of the Machine Learning Group and the Vector Institute.


  • Machine Learning
  • Computer Vision
  • Robotics - Autonomous Driving
  • Generative Models
  • Imitation Learning


  • PhD in Computer Science, 2020 - Present

    University of Toronto

  • MSc in Computer Science, 2018 - 2020

    University of Toronto

  • BSc in Computer Science, 2013 - 2017

    Universitat Politècnica de Catalunya

  • BSc in Industrial Tech. Engineering, 2012 - 2017

    Universitat Politècnica de Catalunya


(* denotes equal contribution)

MP3: A Unified Model to Map, Perceive, Predict and Plan

CVPR 2021 (Oral)
Interpretable end-to-end neural motion planning without high-definition maps

LookOut: Diverse Multi-Future Prediction and Planning for Self-Driving

arXiv preprint 2021
Contingency planning from diverse joint trajectory samples for all actors in the scene

TrafficSim: Learning to Simulate Realistic Multi-Agent Behaviors

CVPR 2021
Realistic long-term vehicle behavior simulation learned from imitation and common sense

AdvSim: Generating Safety-Critical Scenarios for Self-Driving Vehicles

CVPR 2021
Critical scenario generation by modifying the actors' trajectories in a physically plausible manner and updating the LiDAR sensor data to create realistic observations of the perturbed world

Deep Multi-Task Learning for Joint Localization, Perception, and Prediction

CVPR 2021
Efficient end-to-end joint localization, perception, prediction able to correct localization errors

Diverse Complexity Measures for Dataset Curation in Self-driving

arXiv preprint 2021
Model-agnostic approach to dataset curation for autonomy tasks

Safety-Oriented Pedestrian Motion and Scene Occupancy Forecasting

arXiv preprint 2020
Hybrid instance-based and instance-free approach to pedestrian behavior prediction

Strobe: Streaming Object Detection from LiDAR Packets

CoRL 2020 (Spotlight)
Existing LiDAR perception systems wait 100ms just to build a sweep. StrObe instead does streaming detection from LiDAR packets and achieve an end-to-end latency of 21ms

Implicit Latent Variable Model for Scene-Consistent Motion Forecasting

ECCV 2020
ILVM characterizes the joint distribution over multiple actors' future trajectories

Perceive, Predict, and Plan: Safe Motion Planning Through Interpretable Semantic Representations

ECCV 2020
End-to-end neural motion planner based on interpretable semantic scene occupancies

RadarNet: Exploiting Radar for Robust Perception of Dynamic Objects

ECCV 2020
Multi-level fusion of LiDAR & Radar for object detection and velocity estimation

The Importance of Prior Knowledge in Precise Multimodal Prediction

IROS 2020 (Oral)
Incorporate non-differentiable prior knowledge for behavior forecasting

PnPNet: End-to-End Perception and Prediction with Tracking in the Loop

CVPR 2020
Tracking in the loop in joint perception and prediction

Spatially-Aware Graph Neural Networks for Relational Behavior Forecasting from Sensor Data

ICRA 2020
Relational reasoning for multi-agent behavior prediction from sensors

Discrete Residual Flow for Probabilistic Pedestrian Behavior Prediction

CoRL 2019 (Spotlight)
Long-term pedestrian forecasting with occupancy grid maps

End-to-end Interpretable Neural Motion Planner

CVPR 2019 (Oral)
Neural motion planner from LiDAR and HD maps

Intentnet: Learning to Predict Intention from Raw Sensor Data

CoRL 2018 (Spotlight)
Joint perception and prediction from LiDAR point clouds and HD maps



Research Scientist

Uber Advanced Technologies Group

Oct 2017 – Present Toronto, Canada
Research in Autonomous Driving: Perception, Prediction and Motion Planning systems.

Research Assistant

University of Toronto

Feb 2017 – Jul 2017 Toronto, Canada
Research in spatio-temporal reasoning for sports analytics. Worked on automatizing the NBA Play-by-Play reports. Supervised by Prof. Urtasun.

Data Analytics Consultant

Arcvi Big Data Agency

Jun 2016 – Jan 2017 Barcelona, Spain
Creation of strategy solutions using simple Machine Learning techniques. Advised multiple retail, insurance and credit companies.

Software Engineering Intern

Psycle Interactive Ltd.

Jun 2015 – Aug 2015 Whitchurch, United Kingdom
Mobile application development and UI/UX design. Research project on document topic classification and information retrieval.


(excluding paper presentations)

Joint Perception and Prediction (PnP)

Talk at EPFL VITA Reading Group covering my research on PnP from IntentNet to ILVM

Prediction Tutorial

Prediction part of the tutorial All about self-driving at CVPR20