Overview

In the next few decades we are going to witness millions of people, from various backgrounds and levels of technical expertise, needing to effectively interact with robotic technologies on a daily basis. As such, people will need to modify the behavior of their robots without explicitly writing code, but by providing only a small number of kinesthetic or visual demonstrations, or even natural language commands. At the same time, robots should try to infer and predict the human's intentions and internal objectives from past interactions, in order to provide assistance before it is explicitly asked. This graduate-level course will examine some of the most important papers in imitation learning for robot control, placing more emphasis on developments in the last 10 years. Its purpose is to familiarize students with the frontiers of this research area, to help them identify open problems, and to enable them to make a novel contribution.

Prerequisites

You need to be comfortable with: introductory machine learning concepts (such as from CSC411/CSC413/ECE521 or equivalent), linear algebra, basic multivariable calculus, intro to probability. You also need to have strong programming skills in Python. Note: if you don't meet all the prerequisites above please contact the instructor by email. Optional, but recommended: experience with neural networks, such as from CSC321, introductory-level familiarity with reinforcement learning and control.

Teaching Staff

Instructor
Florian Shkurti
x@cs.toronto.edu, x=csc2626-instructor
Office Hours: Mon 12-1pm ET, in person at Sandford Fleming 3328 + on Zoom
Teaching Assistants
Jonathan Lorraine, Mohamed Khodeir, and Skylar Hao
Please use csc2626-tas@cs.toronto.edu, not personal emails
Office Hours (Jonathan): Tue 11-12pm ET, on Zoom
Office Hours (Skylar): Thu 11-12pm ET, on Zoom

Course Details

Lectures: Wednesdays, 11am-1pm ET (in-person, OISE Building 2-212, lectures recorded on Zoom)
Zoom link is posted on the course's Quercus homepage
Announcements will be posted on Quercus
Discussions will take place on Piazza
Anonymous feedback form for suggested improvements

Grading and Important Dates

  • Assignment 1 (25%): due Oct 3 at 6pm ET
  • Assignment 2 (25%): due Oct 18th at 6pm ET
  • Project Proposal (10%): due Oct 25 at 6pm. Students can take on projects in groups of 2-3 people. Tips for a good project proposal can be found here. Proposals should not be based only on papers covered in class by the proposal due date. Students are encouraged to look further ahead in the schedule and to start planning their project definition well ahead of this deadline. Students who need help choosing or crystallizing a project idea should email the instructor or the TAs, come to office hours, or book appointments to discuss ideas.
  • Midterm Progress Report (5%): due Nov 10 at 6pm ET. Tips and expectations for a good midterm progress report are here.
  • Project Presentation (5%): in class on Dec 7. This will be a short presentation, approximately 5 minutes, depending on the number of groups. More detailed instructions will be posted towards the end of the term.
  • Final Project Report and Code (30%): due Dec 12 at 6pm ET. Tips and expectations for a good final project report can be found here.

Course Description

This course will broadly cover the following areas:

  • Imitating the policies of demonstrators (people, expensive algorithms, optimal controllers)
  • Connections between imitation learning, optimal control, and reinforcement learning
  • Learning the cost functions that best explain a set of demonstrations
  • Shared autonomy between humans and robots for real-time control

Schedule

Lecture Date Topics Slides
1 Sep 14 Introduction
Motivation, logistics, rough description of the topics to be covered.

Imitation vs. Robust Behavioral Cloning
ALVINN: An autonomous land vehicle in a neural network
Visual path following on a manifold in unstructured three-dimensional terrain
End-to-end learning for self-driving cars
A machine learning approach to visual perception of forest trails for mobile robots
DAgger: A reduction of imitation learning and structured prediction to no-regret online learning
Learning monocular reactive UAV control in cluttered natural environments
An invitation to imitation

Optional Reading
A survey of robot learning from demonstration
ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst
Goal-conditioned imitation learning
Vision-based goal-conditioned policies for underwater navigation in the presence of obstacles
On-policy robot imitation learning from a converging supervisor
Disagreement-regularized imitation learning

Optional Reading: Only Query the Expert when the Learner is Uncertain
Dropout as a Bayesian approximation: representing model uncertainty in deep learning
Dropout: A simple way to prevent neural networks from overfitting
What my deep model doesn't know
Weight uncertainty in neural networks
Maximum mean discrepancy imitation learning
DropoutDAgger: A Bayesian approach to safe imitation learning
SHIV: Reducing supervisor burden in DAgger using support vectors
Query-efficient imitation learning for end-to-end autonomous driving
Consistent estimators for learning to defer to an expert

Quiz 0
Syllabus
Slides
2 Sep 21 Intro to Optimal Control and Model-Based Reinforcement Learning
Linear Quadratic Regulator and some examples
Iterative Linear Quadratic Regulator
Model Predictive Control
Ben Recht: An outsider's tour of RL (watch his ICML'18 tutorial, too)

Optional Reading: Model-based RL
PILCO: Probabilistic inference for learning control
Deep reinforcement learning in a handful of trials using probabilistic dynamics models
Learning particle dynamics for manipulating rigid bodies, deformable objects, and fluids
End-to-end differentiable physics for learning and control
Synthesizing neural network controllers with probabilistic model based reinforcement learning
A survey on policy search algorithms for learning robot controllers in a handful of trials
Reinforcement learning in robotics: a survey
DeepMPC: Learning deep latent features for model predictive control
Learning latent dynamics for planning from pixels

Optional Reading: Monotonic Improvement of the Value Function
Algorithmic framework for model-based deep reinforcement learning with theoretical guarantees
When to Trust Your Model: Model-Based Policy Optimization

Optional Reading: Learning Dynamics Where it Matters for the Value Function
Value Gradient Weighted Model-Based Reinforcement Learning

Slides
3 Sep 28 Offline / Batch Reinforcement Learning
Scaling data-driven robotics with reward sketching and batch reinforcement learning
Off-policy deep reinforcement learning without exploration
Conservative Q-Learning for offline reinforcement learning
D4RL: Datasets for deep data-driven reinforcement learning
What matters in learning from offline human demonstrations for robot manipulation
NeurIPS 2020 tutorial on offline RL

Optional Reading
Offline reinforcement learning: tutorial, review, and perspectives on open problems
Should I run offline reinforcement learning or behavioral cloning?
Why should I trust you, Bellman? The Bellman error is a poor replacement for value error
A minimalist approach to offline reinforcement learning
Benchmarking batch deep reinforcement learning algorithms
Stabilizing off-policy Q-Learning via bootstrapping error reduction
An optimistic perspective on offline reinforcement learning
COG: Connecting new skills to past experience with offline reinforcement learning
IRIS: Implicit reinforcement without interaction at scale for learning control from offline robot manipulation data
(Batch) reinforcement learning for robot soccer
Instabilities of offline RL with pre-trained neural representation
Targeted environment design from offline data

Slides
4 Oct 5 Imitation Learners Guided by Optimal Control Experts and Physics-based Dynamics Models
Learning neural network policies with guided policy search under unknown dynamics
PLATO: Policy learning using adaptive trajectory optimization
Using probabilistic movement primitives in robotics
DeepMimic: Example-guided deep reinforcement learning of physics-based character skills
Dynamic Movement Primitives in robotics: a tutorial survey

Optional Reading
Model-based imitation learning by probabilistic trajectory matching
Combining self-supervised learning and imitation for vision-based rope manipulation
SQIL: Imitation learning via reinforcement learning with sparse rewards
Accelerating online reinforcement learning with offline datasets
Learning movement primitive libraries through probabilistic segmentation

Slides
5 Oct 12 Imitation as Program Induction. Modular Decomposition of Demonstrations into Skills. Imitating Long-Horizon Tasks.
Neural Task Programming: Learning to generalize across hierarchical tasks
TACO: Learning task decomposition via temporal alignment for control
Learning to generalize across long-horizon tasks from human demonstrations
Neural programmer-interpreters
The motion grammar: analysis of a linguistic method for robot control

Optional Reading
Action understanding as inverse planning
Incremental learning of subtasks from unsegmented demonstration
Inducing probabilistic context-free grammars for the sequencing of movement primitives
Neural Task Graphs: Generalizing to unseen tasks from a single video demonstration
Neural program synthesis from diverse demonstration videos
Automata guided reinforcement learning with demonstrations
A syntactic approach to robot imitation learning using probabilistic activity grammars
Robot learning from demonstration by constructing skill trees
Transition state clustering: Unsupervised surgical trajectory segmentation for robot learning
Learning to sequence movement primitives from demonstrations
Imitation-projected programmatic reinforcement learning
Reinforcement and imitation learning for diverse visuomotor skills
Inferring task goals and constraints using Bayesian nonparametric inverse reinforcement learning
You only demonstrate once: category-level manipulation from single visual demonstration
Bottom-up skill discovery from unsegmented demonstrations for long-horizon robot manipulation

Slides
6 Oct 19 Inverse Reinforcement Learning
Maximum entropy inverse reinforcement learning
Active preference-based learning of reward functions
Extrapolating beyond suboptimal demonstrations via inverse reinforcement learning from observations
Large-scale cost function learning for path planning using deep inverse reinforcement learning
Guided Cost Learning: Deep inverse optimal control via policy optimization
Inverse KKT: Learning cost functions of manipulation tasks from demonstrations
Bayesian inverse reinforcement learning

Optional Reading
Inverse reward design
Nonlinear inverse reinforcement learning with gaussian processes
Maximum margin planning
Compatible reward inverse reinforcement learning
Learning the preferences of ignorant, inconsistent agents
Imputing a convex objective function
IQ-Learn: inverse soft-Q learning for imitation
Better-than-demonstrator imitation learning via automatically-ranked demonstrations

Optional Reading: Applications of IRL
Socially compliant mobile robot navigation via inverse reinforcement learning
Model-based probabilistic pursuit via inverse reinforcement learning
First-person activity forecasting with online inverse reinforcement learning
Learning strategies in table tennis using inverse reinforcement learning
Planning-based prediction for pedestrians
Activity forecasting

Slides
7 Oct 26 Shared Autonomy for Robot Control and Human in-the-Loop Imitation
Shared autonomy via deep reinforcement learning
Shared autonomy via hindsight optimization
Learning models for shared control of human-machine systems with unknown dynamics
RelaxedIK: Real-time synthesis of accurate and feasible robot arm motion
Human-in-the-loop imitation learning using remote teleoperation
Error-aware imitation learning from teleoperation data for mobile manipulation
Controlling assistive robots with learned latent actions

Optional Reading
Designing robot learners that ask good questions
Blending human and robot inputs for sliding scale autonomy
Inferring and assisting with constraints in shared autonomy
Collaborative control for a robotic wheelchair: evaluation of performance, attention, and workload
Director: A user interface designed for robot operation with shared autonomy
Learning multi-arm manipulation through collaborative teleoperation
Interactive autonomous driving through adaptation from participation

Slides
8 Nov 2 Adversarial Imitation Learning
GAIL: Generative adversarial imitation learning
Learning robust rewards with adversarial inverse reinforcement learning
InfoGAIL: interpretable imitation learning from visual demonstrations
Model-free imitation learning with policy optimization
Imitation learning via off-policy distribution matching
Domain adaptive imitation learning
What matters for adversarial imitation learning?

Slides
9 Nov 9 Reading Week

No lectures or office hours this week (Nov 7 - 11)

10 Nov 16 Imitation Learning Combined with Reinforcement Learning and Planning. Imitating Long-Horizon Tasks.
AggreVaTe: Reinforcement and imitation learning via interactive no-regret learning
Agile off-road autonomous driving using end-to-end deep imitation learning
End-to-end driving via conditional imitation learning
Relay Policy Learning: solving long-horizon tasks via imitation and reinforcement learning
Deep Q-learning from demonstrations
End-to-end interpretable neural motion planner
Learning complex dexterous manipulation with deep reinforcement learning and demonstrations
Hierarchical imitation and reinforcement learning

Optional Reading: Imitation from Cost-to-Go Queries
Deeply AggreVaTeD: Differentiable imitation learning for sequential prediction
Convergence of value aggregation for imitation learning
Truncated Horizon Policy Search: Combining reinforcement learning & imitation learning
Fast policy learning through imitation and reinforcement

Optional Reading: Imitation and Reinforcement Learning with Imperfect Demonstrations
Reinforcement learning from imperfect demonstrations
Shaping rewards for reinforcement learning with imperfect demonstrations using generative models
Reinforcement learning from imperfect demonstrations under soft expert guidance
Robust imitation learning from noisy demonstrations

Optional Reading: Imitation can Improve Search and Exploration
Overcoming exploration in reinforcement learning with demonstrations
Learning to gather information via imitation
Exploration from demonstration for interactive reinforcement learning
Learning to search via retrospective imitation

Slides
11 Nov 23 Representation Learning and Generalization Guarantees for Imitation Learning
Generalization guarantees for imitation learning
Provable representation learning for imitation with contrastive Fourier features
TRAIL: near-optimal imitation learning with suboptimal data
Representation matters: offline pretraining for sequential decision making
Imitation learning with stability and safety guarantees

Optional Reading
Provable representation learning for imitation learning via bi-level optimization
An empirical investigation of representation learning for imitation
Improving zero-shot generalization in offline reinforcement learning using Generalized Similarity Functions
Neural network training under semidefinite constraints

12 Nov 30 Rewards, Task Specification, and Value Alignment
Policy invariance under reward transformations: theory and applications to reward shaping
Concrete problems in AI safety
Bayesian inference of temporal task specifications from demonstrations
Understanding natural language commands for robotic navigation and mobile manipulation
Can foundation models perform zero-shot task specification for robot manipulation?
Do as I can, not as I say: grounding language in robotic affordances

Optional Reading
Cooperative inverse reinforcement learning
Scalable agent alignment via reward modeling: a research direction
Robots that use language
Learning perceptually grounded word meanings from unaligned parallel data
Asking for help using inverse semantics
Learning the reward function for a misspecified model
Perceiver-Actor: A multi-task transformer for robotic manipulation
Code as Policies: Language model programs for embodied control

13 Dec 7 Project Presentations

Recommended, but optional, books

Recommended simulators and datasets

You are encouraged to use the simplest possible simulator to accomplish the task you are interested in. In most cases this means Mujoco, but feel free to build your own.
For all the starred environments below, please be aware of the 1-machine/student licensing restriction for the Mujoco physics engine:

Resources for planning, control, and RL

Resources for ML

Recommended courses