Rodrigo Toro Icarte

Rodrigo Toro Icarte
Department of Computer Science
University of Toronto
10 King's College Road, Rm.3302
Toronto, Ontario
Canada, M5S 3G4

r n t o r o [at] c s [dot] t o r o n t o [dot] e d u
Curriculum Vitae

About Me

Research Direction

I believe that Artificial Intelligence will drive us to the next era of the human race. I hope someday we are able to converse, debate, and reason with machines. The results of these interactions are difficult to predict, but they will no doubt have a huge impact on society, as it currently exists. Intelligent machines could teach, advise, and help us. Their points of view will be valuable because machines are capable of processing more data than we are and their conclusions should not be influenced by our cognitive biases. However, creating intelligent machines is one of the most challenging (and exciting) problems that computer science has faced. Though remarkable progress has been made, our machines are still incapable of fully understanding their surroundings. In particular, their lack of commonsense knowledge constrains their ability to behave in a sensible way.

My research focuses on building agents that can learn and use knowledge of "how the world works" to discover optimal behaviors. To do so, I look for connections between inductive methods (Machine Learning and Reinforcement Learning) and deductive methods (Search, Planning, Knowledge Representation, and Reasoning). My hope is that real intelligence lies somewhere in the middle of inductive and deductive approaches.

Academic Bio

I am a PhD student in the knowledge representation group at the University of Toronto. I am also a member of the Canadian Artificial Intelligence Association and the Vector Institute. My supervisor is Sheila McIlraith. I did my undergrad in Computer Engineering and MSc in Computer Science at Pontificia Universidad Católica de Chile (PUC). My master's degree was co-supervised by Alvaro Soto and Jorge Baier. While I was at PUC, I taught the undergraduate course "Introduction to Computer Programming Languages."

Published Work

(NeurIPS19) Learning Reward Machines for Partially Observable Reinforcement Learning
by R. Toro Icarte, E. Waldie, T. Q. Klassen, R. Valenzano, M. P. Castro, and S. A. McIlraith.

Abstract: Reward Machines (RMs), originally proposed for specifying problems in Reinforcement Learning (RL), provide a structured, automata-based representation of a reward function that allows an agent to decompose problems into subproblems that can be efficiently learned using off-policy learning. Here we show that RMs can be learned from experience, instead of being specified by the user, and that the resulting problem decomposition can be used to effectively solve partially observable RL problems. We pose the task of learning RMs as a discrete optimization problem where the objective is to find an RM that decomposes the problem into a set of subproblems such that the combination of their optimal memoryless policies is an optimal policy for the original problem. We show the effectiveness of this approach on three partially observable domains, where it significantly outperforms A3C, PPO, and ACER, and discuss its advantages, limitations, and broader potential.

Link to the paper, slides, and poster.

Preliminary version: (RLDM19) Searching for Markovian Subproblems to Address Partially Observable Reinforcement Learning [paper, poster].

News: Our work has been accepted for spotlight presentation at NeurIPS!

(CP19) Training Binarized Neural Networks using MIP and CP
by R. Toro Icarte, L. Illanes, M. P. Castro, A. Cire, S. A. McIlraith, and J. C. Beck.

Abstract: Binarized Neural Networks (BNNs) are an important class of neural network characterized by weights and activations restricted to the set {-1,+1}. BNNs provide simple compact descriptions and as such have a wide range of applications in low-power devices. In this paper, we investigate a model-based approach to training BNNs using constraint programming (CP), mixed-integer programming (MIP), and CP/MIP hybrids. We formulate the training problem as finding a set of weights that correctly classify the training set instances while optimizing objective functions that have been proposed in the literature as proxies for generalizability. Our experimental results on the MNIST digit recognition dataset suggest that—when training data is limited—the BNNs found by our hybrid approach generalize better than those obtained from a state-of-the-art gradient descent method. More broadly, this work enables the analysis of neural network performance based on the availability of optimal solutions and optimality bounds.

Link to the paper, slides, and code.

(RLDM19) Symbolic Planning and Model-Free Reinforcement Learning: Training Taskable Agents
by L. Illanes, X. Yan, R. Toro Icarte, and S. A. McIlraith.

Abstract: We investigate the use of explicit action models—as typically used for Automated Planning—in the context of Reinforcement Learning (RL). Action models are a type of causal model that allows agents to reason about macro-actions and high-level symbolic state spaces. As a consequence, agents with access to an action model and a planner are taskable, which means that the user can give them a goal condition and they will find a sequence of macro-actions to achieve the goal. Such a high-level sequence can be exploited by RL agents through various techniques (e.g., policy sketches, reward machines, hierarchical RL in general). In this paper, we propose a taskable RL agent that can exploit action models to learn new tasks much faster. Our approach is based on state-of-the-art symbolic planning, in combination with hierarchical RL and recent advances in problem decomposition for RL. Empirical results, in tabular and deep RL cases, show that our approach finds high-quality policies for previously unseen tasks in extremely few training steps, consistently outperforming standard Hierarchical RL techniques.

Link to the paper and poster.

(IJCAI19) LTL and Beyond: Formal Languages for Reward Function Specification in Reinforcement Learning
by A. Camacho, R. Toro Icarte, T. Q. Klassen, R. Valenzano, and S. A. McIlraith.

Abstract: In Reinforcement Learning (RL), an agent is guided by the rewards it receives from the reward function. Unfortunately, it may take many interactions with the environment to learn from sparse rewards, and it can be challenging to specify reward functions that reflect complex reward-worthy behavior. We propose using reward machines (RMs), which are automata-based representations that expose reward function structure, as a normal form representation for reward functions. We show how specifications of reward in various formal languages, including LTL and other regular languages, can be automatically translated into RMs, easing the burden of complex reward function specification. We then show how the exposed structure of the reward function can be exploited by tailored q-learning algorithms and automated reward shaping techniques in order to improve the sample efficiency of reinforcement learning methods. Experiments show that these RM-tailored techniques significantly outperform state-of-the-art (deep) RL algorithms, solving problems that otherwise cannot reasonably be solved by existing approaches.

Link to the paper, poster, bibtex, and code.

(ICML18) Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning
by R. Toro Icarte, T. Q. Klassen, R. Valenzano, and S. A. McIlraith.

Abstract: In this paper we propose Reward Machines—a type of finite state machine that supports the specification of reward functions while exposing reward function structure to the learner and supporting decomposition. We then present Q-Learning for Reward Machines (QRM), an algorithm which appropriately decomposes the reward machine and uses off-policy q-learning to simultaneously learn subpolicies for the different components. QRM is guaranteed to converge to an optimal policy in the tabular case, in contrast to Hierarchical Reinforcement Learning methods which might converge to suboptimal policies. We demonstrate this behavior experimentally in two discrete domains. We also show how function approximation methods like neural networks can be incorporated into QRM, and that doing so can find better policies more quickly than hierarchical methods in a domain with a continuous state space.

Link to the paper, slides, poster, bibtex, and code.

News: We were selected to give a long talk at ICML!

(AAMAS18) Teaching Multiple Tasks to an RL Agent using LTL
by R. Toro Icarte, T. Q. Klassen, R. Valenzano, and S. A. McIlraith.

Abstract: This paper examines the problem of how to teach multiple tasks to an agent that learns using Reinforcement Learning (RL). To this end, we propose the use of Linear Temporal Logic (LTL) as a compelling language for teaching multiple tasks to an RL agent in a manner that supports composition of learned skills. We also propose a novel algorithm that exploits LTL progression and off-policy RL to speed up learning without compromising convergence guarantees. Experiments over randomly generated Minecraft-like grids illustrate our superior performance relative to the state of the art.

Link to the paper, slides, poster, bibtex, and code.

News: I presented this work at the learning by instruction workshop at NeurIPS 2018 [slides].

(CCAI18) Advice-Based Exploration in Model-Based Reinforcement Learning
by R. Toro Icarte, T. Q. Klassen, R. Valenzano, and S. A. McIlraith.

Abstract: A reason that current machine learning approaches still fall short of human performance in many areas may lie in the way problems are defined for them. When people try to reach their goals, they often are not limited to figuring out what to do based only on observing the environment, but can also make use of other things, like linguistically expressed advice from other people. We believe that human advice can also benefit artificial agents, and can play a major role in enabling them to solve harder problems.

Method: In this project, we look for ways to use advice when solving MDPs. The advice language includes temporal operators that allow us to recommend, for instance, to "get the key and then go to the door", "collect all the cookies", or "avoid nails and holes". We have experimented using Model-Free and Model-Based Reinforcement Learning. In the video, the agent (green dot) is able to escape from a maze by learning from experience and human advice. To escape from the maze, the agent has to collect the keys (yellow dots) to open the doors (brown square), while avoiding nails (gray dots) and holes (blue dots) and, hopefully, collecting cookies (brown dots).

Link to the paper and bibtex.

Preliminary version: (RLDM17) Using Advice in Model-Based Reinforcement Learning [paper, poster].

(IJCAI17) How a General-Purpose Commonsense Ontology can Improve Performance of Learning-Based Image Retrieval
by R. Toro Icarte, J. Baier, C. Ruz, and A. Soto.

Abstract: The knowledge representation community has invested great efforts in building general-purpose ontologies which contain large amounts of commonsense knowledge on various aspects of the world. Among the thousands of assertions contained in them, many express relations that can be regarded as relevant to visual inference; e.g., "a ball is used by a football player", "a tennis player is located at a tennis court". In general, current state-of-the-art approaches for visual recognition do not exploit these rule-based knowledge sources. In this project, we study the question of whether or not general-purpose ontologies—specifically, MIT’s ConceptNet ontology—could play a role in state-of-the-art vision systems.

Links: to the paper, bibtex, and code.

Other Computer Vision Projects