Responsive image

Seyed Kamyar Seyed Ghasemipour

kamyar (at) cs {dot} toronto [dot] edu

University of Toronto

Vector Institute

I'm a graduate student in the Machine Learning Group at the University of Toronto and the Vector Institute. My supervisor is Rich Zemel. Broadly, my areas of interest lie at the intersection of Reinforcement Learning and Probablistic methods. More specifically, the types of problems I enjoy thinking about are motivated by two — often incompatible — directions: Developing and understanding algorithms towards practical impact, and Building AGI (building Ironman's Jarvis is the reason I got into A.I.). In past lives I used to do research in Computer Vision and Generative Models.

You can find my CV here.

Announcements

  • July 22, 2020 — New Preprint

    Our preprint "EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL" (with Dale Schuurmans and Shane Gu) is up on arxiv

  • November 1, 2019 — Best Paper Award @ CoRL 2019!!!! :D

    Our paper "A Divergence Minimization Perspective on Imitation Learning Methods" (with Richard Zemel and Shane Gu) received the Best Paper Award at the Conference on Robot Learning (CoRL) 2019!

  • Earlier Announcements
  • September 30, 2019 — Research Internship @ Google Brain Robotics

    This semester I am interning with Corey Lynch and Pierre Sermanet at Google Brain Robotics in Mountainview

  • September 7, 2019 — CoRL Paper (Oral! :D)

    Our paper "A Divergence Minimization Perspective on Imitation Learning Methods" (with Richard Zemel and Shane Gu) was accepted as an oral at CoRL 2019!

  • September 4, 2019 — NeurIPS Paper

    Our paper "SMILe: Scalable Meta Inverse Reinforcement Learning through Context-Conditional Policies" (with Shane Gu and Richard Zemel) was accepted as a poster at NeurIPS 2019!

  • June 1, 2019 — ICML Workshop Oral Presentation

    Our paper "SMILe: Scalable Meta Inverse Reinforcement Learning through Context-Conditional Policies" (with Shane Gu and Richard Zemel) was accepted as an oral presentation to the Imitation, Intent, and Interaction (I3) Workshop at ICML 2019!

  • June 1, 2019 — ICML Workshop Poster

    Our paper "Interpreting Imitation Learning Methods Under a Divergence Minimization Perspective" (with Shane Gu and Richard Zemel) was accepted to the Imitation, Intent, and Interaction (I3) Workshop at ICML 2019!

  • April 20, 2019 — ICLR Workshop Poster

    Our paper "Interpreting Imitation Learning Methods Under a Divergence Minimization Perspective" (with Shane Gu and Richard Zemel) was accepted to the Deep Generative Models for Highly Structured Data Workshop at ICLR 2019!

  • Papers

    Preprints / Under Review

  • EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL
    Seyed Kamyar Seyed Ghasemipour, Dale Schuurmans, Shane Gu
    Preprint, Under Review

    abstract

    Off-policy reinforcement learning (RL) holds the promise of sample-efficient learning of decision-making policies by leveraging past experience. However, in the offline RL setting -- where a fixed collection of interactions are provided and no further interactions are allowed -- it has been shown that standard off-policy RL methods can significantly underperform. Recently proposed methods aim to address this shortcoming by regularizing learned policies to remain close to the given dataset of interactions. However, these methods involve several configurable components such as learning a separate policy network on top of a behavior cloning actor, and explicitly constraining action spaces through clipping or reward penalties. Striving for simultaneous simplicity and performance, in this work we present a novel backup operator, Expected-Max Q-Learning (EMaQ), which naturally restricts learned policies to remain within the support of the offline dataset \emph{without any explicit regularization}, while retaining desirable theoretical properties such as contraction. We demonstrate that EMaQ is competitive with Soft Actor Critic (SAC) in online RL, and surpasses SAC in the deployment-efficient setting. In the offline RL setting -- the main focus of this work -- through EMaQ we are able to make important observations regarding key components of offline RL, and the nature of standard benchmark tasks. Lastly but importantly, we observe that EMaQ achieves state-of-the-art performance with fewer moving parts such as one less function approximation, making it a strong, yet easy to implement baseline for future work.

    / arxiv
  • Conference Publications

  • A Divergence Minimization Perspective on Imitation Learning Methods
    Seyed Kamyar Seyed Ghasemipour, Richard Zemel, Shane Gu
    Best Paper Award, Oral Presentation, CoRL 2019

    abstract

    In many settings, it is desirable to learn decision-making and control policies through learning or bootstrapping from expert demonstrations. The most common approaches under this Imitation Learning (IL) framework are Behavioural Cloning (BC), and Inverse Reinforcement Learning (IRL). Recent methods for IRL have demonstrated the capacity to learn effective policies with access to a very limited set of demonstrations, a scenario in which BC methods often fail. Unfortunately, due to multiple factors of variation, directly comparing these methods does not provide adequate intuition for understanding this difference in performance. In this work, we present a unified probabilistic perspective on IL algorithms based on divergence minimization. We present $f$-MAX, an $f$-divergence generalization of AIRL [Fu et al., 2018], a state-of-the-art IRL method. $f$-MAX enables us to relate prior IRL methods such as GAIL [Ho & Ermon, 2016] and AIRL [Fu et al., 2018], and understand their algorithmic properties. Through the lens of divergence minimization we tease apart the differences between BC and successful IRL approaches, and empirically evaluate these nuances on simulated high-dimensional continuous control domains. Our findings conclusively identify that IRL's state-marginal matching objective contributes most to its superior performance. Lastly, we apply our new understanding of IL method to the problem of state-marginal matching, where we demonstrate that in simulated arm pushing environments we can teach agents a diverse range of behaviours using simply hand-specified state distributions and no reward functions or expert demonstrations. For datasets and reproducing results please refer to https://github.com/KamyarGh/rl_swiss/blob/master/reproducing/fmax_paper.md .

    / camera-ready pdf / code
  • SMILe: Scalable Meta Inverse Reinforcement Learning through Context-Conditional Policies
    Seyed Kamyar Seyed Ghasemipour, Shane Gu, Richard Zemel
    NeurIPS 2019

    abstract

    Imitation Learning (IL) has been successfully applied to complex sequential decision-making problems where standard Reinforcement Learning (RL) algorithms fail. A number of recent methods extend IL to few-shot learning scenarios, where a meta-trained policy learns to quickly master new tasks using limited demonstrations. However, although Inverse Reinforcement Learning (IRL) often outperforms Behavioral Cloning (BC) in terms of imitation quality, most of these approaches build on BC due to its simple optimization objective. In this work, we propose SMILe, a scalable framework for Meta Inverse Reinforcement Learning (Meta-IRL) based on maximum entropy IRL, which can learn high-quality policies from few demonstrations. We examine the efficacy of our method on a variety of high-dimensional simulated continuous control tasks and observe that SMILe significantly outperforms Meta-BC. Furthermore, we observe that SMILe performs comparably or outperforms Meta-DAgger, while being applicable in the state-only setting and not requiring online experts. To our knowledge, our approach is the first efficient method for Meta-IRL that scales to the function approximator setting. For datasets and reproducing results please refer to https://github.com/KamyarGh/rl_swiss/blob/master/reproducing/smile_paper.md .

    / camera-ready pdf / code
  • Workshop Publications

  • SMILe: Scalable Meta Inverse Reinforcement Learning through Context-Conditional Policies
    Seyed Kamyar Seyed Ghasemipour, Shane Gu, Richard Zemel
    Imitation, Intent, and Interaction (I3) Workshop, ICML 2019 (Oral Presentation)

    abstract

    Imitation Learning (IL) has been successfully applied to complex sequential decision-making problems where standard Reinforcement Learning (RL) algorithms fail. A number of recent methods extend IL to few-shot learning scenarios, where a meta-trained policy learns to quickly master new tasks using limited demonstrations. However, although Inverse Reinforcement Learning (IRL) often outperforms Behavioral Cloning (BC) in terms of imitation quality, most of these approaches build on BC due to its simple optimization objective. In this work, we propose SMILe, a scalable framework for Meta Inverse Reinforcement Learning (Meta-IRL) based on maximum entropy IRL, which can learn high-quality policies from few demonstrations. We examine the efficacy of our method on a variety of high-dimensional simulated continuous control tasks and observe that SMILe significantly outperforms Meta-BC. To our knowledge, our approach is the first efficient method for Meta-IRL that scales to the intractable function approximator setting.

    / pdf / code (coming soon) / poster
  • Interpreting Imitation Learning Methods Under a Divergence Minimization Perspective
    Seyed Kamyar Seyed Ghasemipour, Richard Zemel, Shane Gu
    Imitation, Intent, and Interaction (I3) Workshop, ICML 2019
    Deep Generative Models for Highly Structured Data Workshop, ICLR 2019

    abstract

    In many settings, it is desirable to learn decision-making and control policies through learning or from expert demonstrations. The most common approaches under this framework are Behaviour Cloning (BC), and Inverse Reinforcement Learning (IRL). Recent methods for IRL have demonstrated the capacity to learn effective policies with access to a very limited set of demonstrations, a scenario in which BC methods often fail. Unfortunately, directly comparing the algorithms for these methods does not provide adequate intuition for understanding this difference in performance. This is the motivating factor for our work. We begin by presenting $f$-MAX, a generalization of AIRL (Fu et al., 2018), a state-of-the-art IRL method. $f$-MAX provides grounds for more directly comparing the objectives for LfD. We demonstrate that $f$-MAX, and by inheritance AIRL, is a subset of the cost-regularized IRL framework laid out by Ho & Ermon (2016). We conclude by empirically evaluating the factors of difference between various LfD objectives in the continuous control domain.

    / pdf / code (coming soon) / poster
  • Gradient-Based Optimization of Neural Network Architecture
    Will Grathwohl*, Elliot Creager*, Seyed Kamyar Seyed Ghasemipour*, Richard Zemel
    Workshop, ICLR 2018

    abstract

    Neural networks can learn relevant features from data, but their predictive accuracy and propensity to overfit are sensitive to the values of the discrete hyperparameters that specify the network architecture (number of hidden layers, number of units per layer, etc.). Previous work optimized these hyperparmeters via grid search, random search, and black box optimization techniques such as Bayesian optimization. Bolstered by recent advances in gradient-based optimization of discrete stochastic objectives, we instead propose to directly model a distribution over possible architectures and use variational optimization to jointly optimize the network architecture and weights in one training pass. We discuss an implementation of this approach that estimates gradients via the Concrete relaxation, and show that it finds compact and accurate architectures for convolutional neural networks applied to the CIFAR10 and CIFAR100 datasets.

    / pdf
  • Unpusblished Submissions

  • Semi-Supervised Structured Prediction with the Use of Generative Adversarial Networks
    Seyed Kamyar Seyed Ghasemipour, Yujia Li, Jackson Wang, Richard Zemel
    Submitted to ICCV 2017
  • Slides

    SMILe Oral Presentation, Imitation, Intent, and Interaction (I3) Workshop at ICML 2019

    Videos

    SMILe (link coming soon) 15 min, June 15, 2019, Oral Presentation, Imitation, Intent, and Interaction (I3) Workshop at ICML 2019

    Summer 2015 Research Video 1st place undergraduate research video competition

    Summer 2014 Research Video 1st place undergraduate research video competition