CSC2626HS, UofT

Overview

In the next few decades we are going to witness millions of people, from various backgrounds and levels of technical expertise, needing to effectively interact with robotic technologies on a daily basis. As such, people will need to modify the behavior of their robots without explicitly writing code, but by providing only a small number of kinesthetic or visual demonstrations. At the same time, robots should try to infer and predict the human's intentions and internal objectives from past interactions, in order to provide assistance before it is explicitly asked. This graduate-level course will examine some of the most important papers in imitation learning for robot control, placing more emphasis on developments in the last 10 years. Its purpose is to familiarize students with the frontiers of this research area, to help them identify open problems, and to enable them to make a novel contribution.

Prerequisites

You need to be comfortable with: introductory machine learning concepts (such as from CSC411/CSC413/ECE521 or equivalent), linear algebra, basic multivariable calculus, intro to probability. You also need to have strong programming skills in Python. Note: if you don't meet all the prerequisites above please contact the instructor by email. Optional, but recommended: experience with neural networks, such as from CSC321, introductory-level familiarity with reinforcement learning and control.

Teaching Staff

Instructor

Florian Shkurti

x@cs.toronto.edu, x=florian

Office Hours: Wed 2-3pm ET, on Zoom

Teaching Assistant

Homanga Bharadhwaj

y@cs.toronto.edu, y=homanga

Office Hours: Fri 2-3pm ET, on Zoom

Course Details

Lectures: Mondays, 3-5pm ET (online synchronous delivery + recorded lectures)

Zoom link is posted on the course's Quercus homepage

All announcements will be posted on Quercus

Discussions will take place on Piazza

Anonymous feedback form for suggested improvements

Grading and Important Dates

Assignment 1 (25%): due Jan 28, at 6pm ET
Assignment 2 (25%): due Apr 5th, at 6pm ET
Project Proposal (10%): Due Feb 17 at 6pm. Students can take on projects in groups of 2-3 people. Tips for a good project proposal can be found here. Proposals should not be based only on papers covered in class by Feb 17th. Students are encouraged to look further ahead in the schedule and to start planning their project definition well ahead of this deadline. Students who need help choosing or crystallizing a project idea should email the instructor or the TA.
Midterm Progress Report (5%): Due Mar 10 at 6pm ET. Tips and expectations for a good midterm progress report are here.
Project Presentation (5%): On Apr 5, during class. This will be a short presentation, approximately 5-10 minutes, depending on the number of groups.
Final Project Report and Code (30%): Due Apr 12 at 6pm ET. Tips and expectations for a good final project report can be found here.

Course Description

This course will broadly cover the following areas:

Imitating the policies of demonstrators (people, expensive algorithms, optimal controllers)
Connections between imitation learning, optimal control, and reinforcement learning
Learning the cost functions that best explain a set of demonstrations
Shared autonomy between humans and robots for real-time control

Schedule

Lecture	Date	Topics	Slides
1	Jan 11	Introduction Motivation, logistics, rough description of the topics to be covered. Imitation vs. Robust Behavioral Cloning ALVINN: An autonomous land vehicle in a neural network Visual path following on a manifold in unstructured three-dimensional terrain End-to-end learning for self-driving cars A machine learning approach to visual perception of forest trails for mobile robots DAgger: A reduction of imitation learning and structured prediction to no-regret online learning Learning monocular reactive UAV control in cluttered natural environments Required Background Reading An invitation to imitation Optional Reading A survey of robot learning from demonstration ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst Optional Reading: Only Query the Expert when the Learner is Uncertain Dropout as a Bayesian approximation: representing model uncertainty in deep learning Dropout: A simple way to prevent neural networks from overfitting What my deep model doesn't know Weight uncertainty in neural networks Maximum mean discrepancy imitation learning DropoutDAgger: A Bayesian approach to safe imitation learning SHIV: Reducing supervisor burden in DAgger using support vectors Query-efficient imitation learning for end-to-end autonomous driving Consistent estimators for learning to defer to an expert	Quiz 0 Syllabus Slides
2	Jan 18	Intro to Optimal Control and Model-Based Reinforcement Learning Linear Quadratic Regulator and some examples Iterative Linear Quadratic Regulator Model Predictive Control Required Background Reading Ben Recht: An outsider's tour of RL (watch his ICML'18 tutorial, too) Optional Reading PILCO: Probabilistic inference for learning control Deep reinforcement learning in a handful of trials using probabilistic dynamics models Learning particle dynamics for manipulating rigid bodies, deformable objects, and fluids End-to-end differentiable physics for learning and control Synthesizing neural network controllers with probabilistic model based reinforcement learning A survey on policy search algorithms for learning robot controllers in a handful of trials Reinforcement learning in robotics: a survey DeepMPC: Learning deep latent features for model predictive control Learning latent dynamics for planning from pixels Algorithmic framework for model-based deep reinforcement learning with theoretical guarantees	Slides
3	Jan 25	Offline / Batch Reinforcement Learning Scaling data-driven robotics with reward sketching and batch reinforcement learning Off-policy deep reinforcement learning without exploration Conservative Q-Learning for offline reinforcement learning D4RL: Datasets for deep data-driven reinforcement learning Required Background Reading NeurIPS 2020 tutorial on offline RL Optional Reading Offline reinforcement learning: tutorial, review, and perspectives on open problems Benchmarking batch deep reinforcement learning algorithms Stabilizing off-policy Q-Learning via bootstrapping error reduction An optimistic perspective on offline reinforcement learning COG: Connecting new skills to past experience with offline reinforcement learning IRIS: Implicit reinforcement without interaction at scale for learning control from offline robot manipulation data Learning to generalize across long-horizon tasks from human demonstrations (Batch) reinforcement learning for robot soccer	Slides
4	Feb 1	Imitation Learning Combined with Reinforcement Learning, Control, and Planning #1 AggreVaTe: Reinforcement and imitation learning via interactive no-regret learning Agile off-road autonomous driving using end-to-end deep imitation learning End-to-end driving via conditional imitation learning Deep Q-learning from demonstrations End-to-end interpretable neural motion planner Optional Reading: Imitation from Cost-to-Go Queries Deeply AggreVaTeD: Differentiable imitation learning for sequential prediction Convergence of value aggregation for imitation learning Truncated Horizon Policy Search: Combining reinforcement learning & imitation learning Fast policy learning through imitation and reinforcement	Slides
5	Feb 8	Imitation as Program Induction and Modular Decomposition of Demonstrations Neural Task Programming: Learning to generalize across hierarchical tasks TACO: Learning task decomposition via temporal alignment for control Learning movement primitive libraries through probabilistic segmentation Bayesian inference of temporal task specifications from demonstrations Neural programmer-interpreters Required Background Reading The motion grammar: analysis of a linguistic method for robot control Optional Reading Action understanding as inverse planning Incremental learning of subtasks from unsegmented demonstration Inducing probabilistic context-free grammars for the sequencing of movement primitives Neural Task Graphs: Generalizing to unseen tasks from a single video demonstration Neural program synthesis from diverse demonstration videos Automata guided reinforcement learning with demonstrations A syntactic approach to robot imitation learning using probabilistic activity grammars Robot learning from demonstration by constructing skill trees Transition state clustering: Unsupervised surgical trajectory segmentation for robot learning Learning to sequence movement primitives from demonstrations Imitation-projected programmatic reinforcement learning	Slides
6	Feb 15	Reading Week and Family Day (Monday is a holiday, but office hours are still on)
7	Feb 22	Inverse Reinforcement Learning #1 Maximum entropy inverse reinforcement learning Active preference-based learning of reward functions Large-scale cost function learning for path planning using deep inverse reinforcement learning Direct loss minimization inverse optimal control Optional Reading: Applications of IRL Socially compliant mobile robot navigation via inverse reinforcement learning Model-based probabilistic pursuit via inverse reinforcement learning First-person activity forecasting with online inverse reinforcement learning Learning strategies in table tennis using inverse reinforcement learning Planning-based prediction for pedestrians Activity forecasting	Slides
8	Mar 1	Shared Autonomy for Robot Control with Human in-the-Loop Shared autonomy via deep reinforcement learning Interactive autonomous driving through adaptation from participation Shared autonomy via hindsight optimization Learning models for shared control of human-machine systems with unknown dynamics RelaxedIK: Real-time synthesis of accurate and feasible robot arm motion Optional Reading Designing robot learners that ask good questions Blending human and robot inputs for sliding scale autonomy Inferring and assisting with constraints in shared autonomy Collaborative control for a robotic wheelchair: evaluation of performance, attention, and workload Director: A user interface designed for robot operation with shared autonomy	Slides
9	Mar 8	Adversarial Imitation Learning GAIL: Generative adversarial imitation learning Model-based adversarial imitation learning InfoGAIL: interpretable imitation learning from visual demonstrations Model-free imitation learning with policy optimization	Slides
10	Mar 15	Imitation Learning Combined with Reinforcement Learning, Control, and Planning #2 Learning neural network policies with guided policy search under unknown dynamics PLATO: Policy learning using adaptive trajectory optimization Learning complex dexterous manipulation with deep reinforcement learning and demonstrations Using probabilistic movement primitives in robotics Goal-conditioned imitation learning Optional Reading Model-based imitation learning by probabilistic trajectory matching DeepMimic: Example-guided deep reinforcement learning of physics-based character skills Combining self-supervised learning and imitation for vision-based rope manipulation SQIL: Imitation learning via reinforcement learning with sparse rewards Accelerating online reinforcement learning with offline datasets Reinforcement and imitation learning for diverse visuomotor skills Vision-based goal-conditioned policies for underwater navigation in the presence of obstacles Optional Reading: Imitation and Reinforcement Learning with Imperfect Demonstrations Reinforcement learning from imperfect demonstrations Shaping rewards for reinforcement learning with imperfect demonstrations using generative models Reinforcement learning from imperfect demonstrations under soft expert guidance Robust imitation learning from noisy demonstrations Optional Reading: Imitation can Improve Exploration Overcoming exploration in reinforcement learning with demonstrations Learning to gather information via imitation Exploration from demonstration for interactive reinforcement learning	Slides
11	Mar 22	Inverse Reinforcement Learning #2 Guided Cost Learning: Deep inverse optimal control via policy optimization Inverse KKT: Learning cost functions of manipulation tasks from demonstrations Bayesian inverse reinforcement learning Maximum margin planning Learning Robust Rewards with Adversarial Inverse Reinforcement Learning Optional Reading Inverse reward design Nonlinear inverse reinforcement learning with gaussian processes Compatible reward inverse reinforcement learning Learning the preferences of ignorant, inconsistent agents Imputing a convex objective function	Slides
12	Mar 29	Rewards & Value Alignment Learning the reward function for a misspecified model Policy invariance under reward transformations: theory and applications to reward shaping Scalable agent alignment via reward modeling: a research direction Concrete problems in AI safety Cooperative inverse reinforcement learning
13	Apr 5	Project Presentations

Recommended, but optional, books

Robot programming by demonstration, by Aude Billard, Sylvain Calinon, Rudiger Dillmann, Stefan Schaal
Robot learning from human teachers, by Sonia Chernova, Andrea Thomaz
An algorithmic perspective on imitation learning, by Takayuki Osa, Joni Pajarinen, Gerhard Neumann, Andrew Bagnell, Pieter Abbeel, Jan Peters

Recommended simulators and datasets

You are encouraged to use the simplest possible simulator to accomplish the task you are interested in. In most cases this means Mujoco, but feel free to build your own.
For all the starred environments below, please be aware of the 1-machine/student licensing restriction for the Mujoco physics engine:

OpenAI Gym (Robotics*, Mujoco*, Box2D, Classic Control)
DeepMind control suite*
Surreal Robosuite (manipulation*)
Klampt (manipulation and locomotion tasks, contact modeling)
DART (manipulation and locomotion tasks, contact modeling)
Udacity self-driving car simulator (based on Unity, needs a GPU)
CARLA self-driving car simulator (based on Unreal Engine 4, needs a GPU)
Holodeck (based on Unreal Engine 4, needs a GPU)
AirSim (flying vehicles and cars, based on Unreal Engine 4, needs a GPU)
TORCS self-driving car simulator
V-REP (robot arms, humanoids, hexapods)
DeepMind Lab (navigation in mazes)
Gibson environment (navigation, locomotion in indoor environments, needs a GPU)
RLBench (vision-based manipulation, has demonstrations)
IKEA furniture assembly environment (vision-based dual-arm manipulation for furniture assembly)
ALFRED (vision and language based navigation and manipulation)
D4RL (manipulation and navigation datasets for offline RL)
RoboTurk (demonstration data for manipulation)
AI Habitat (visual navigation)
Isaac Gym (gym environments and more, but blazing fast, end-to-end GPU accelerated)
RaiSim (supports biomechanics of human motion, as well as quadrupeds)
Flightmare (fast multi-quadrotor simulation)
PyBullet Drones (fast multi-quadrotor simulation, more aerodynamic effects)
Deformable Ravens (deformable object simulation in PyBullet with demonstrations)

Resources for planning, control, and RL

Open Motion Planning Library
Control Toolbox from ETHZ (C++ only at the moment, but includes automatic differentiation)
Trajectory optimization
Black-DROPS Policy Search (C++ only at the moment)
Guided Policy Search
OpenAI Baselines

Resources for ML

PyTorch
Tensorflow
GPyTorch (for gaussian processes)

Recommended courses

Robot Learning Seminar by Abdeslam Boularias
Deep RL course by Sergey Levine, John Schulman, Chelsea Finn
Deep RL course by Jimmy Ba
Robot Learning and Sensorimotor Control course by Sethu Vijayakumar
Algorithmic HRI course by Anca Dragan
Related sections from Russ Tedrake's underactuated robotics course

CSC2626: Imitation Learning for Robotics, Winter 2021