Rodrigo Toro Icarte

Rodrigo Toro Icarte
Department of Computer Science
University of Toronto
10 King's College Road, Rm.3302
Toronto, Ontario
Canada, M5S 3G4

r n t o r o [at] c s [dot] t o r o n t o [dot] e d u
Curriculum Vitae
Google Scholar
Semantic Scholar

About Me

Research Direction

Artificial Intelligence (AI) is an aspiration, an idea conceived at the beginning of computer science, that would revolutionize the world if achieved. The ultimate goal of AI is to create agents (i.e., computer programs) that can solve problems in the world as well as humans. To advance this vision, most researchers focus on studying particular aspects of AI independently, such as learning, perception, reasoning, or language. This approach has led to important advances and cutting-edge technologies, but it has a drawback. The agents that result from only considering one aspect of AI have a narrow sense of intelligence. They outperform humans in few concrete tasks (e.g., playing the game of Go) while being useless in other contexts. Building agents that solve problems as well as humans requires a global view of AI. My research embraces this view by incorporating insights from three core aspects of intelligence: knowledge, reasoning, and learning, in service of building general-purpose agents with theoretical guarantees and state-of-the-art performance.

Academic Bio

I am a PhD student in the knowledge representation group at the University of Toronto. I am also a member of the Canadian Artificial Intelligence Association and the Vector Institute. My supervisor is Sheila McIlraith. I did my undergrad in Computer Engineering and MSc in Computer Science at Pontificia Universidad Católica de Chile (PUC). My master's degree was co-supervised by Alvaro Soto and Jorge Baier. While I was at PUC, I taught the undergraduate course "Introduction to Computer Programming" [course material].


Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning
by R. Toro Icarte, T. Q. Klassen, R. Valenzano, and S. A. McIlraith.

Abstract: Reinforcement learning (RL) methods usually treat reward functions as black boxes. As such, these methods must extensively interact with the environment in order to discover rewards and optimal policies. In most RL applications, however, users have to program the reward function and, hence, there is the opportunity to treat reward functions as white boxes instead—to show the reward function's code to the RL agent so it can exploit its internal structures to learn optimal policies faster. In this paper, we show how to accomplish this idea in two steps. First, we propose reward machines (RMs), a type of finite state machine that supports the specification of reward functions while exposing reward function structure. We then describe different methodologies to exploit such structures, including automated reward shaping, task decomposition, and counterfactual reasoning for data augmentation. Experiments on tabular and continuous domains show the benefits of exploiting reward structure across different tasks and RL agents.

Link to the paper and code.

The Act of Remembering: A Study in Partially Observable Reinforcement Learning
by R. Toro Icarte, R. Valenzano, T. Q. Klassen, P. Christoffersen, A. Farahmand, and S. A. McIlraith.

Abstract: Reinforcement Learning (RL) agents typically learn memoryless policies—policies that only consider the last observation when selecting actions. Learning memoryless policies is efficient and optimal in fully observable environments. However, some form of memory is necessary when RL agents are faced with partial observability. In this paper, we study a lightweight approach to tackle partial observability in RL. We provide the agent with an external memory and additional actions to control what, if anything, is written to the memory. At every step, the current memory state is part of the agent's observation, and the agent selects a tuple of actions: one action that modifies the environment and another that modifies the memory. When the external memory is sufficiently expressive, optimal memoryless policies yield globally optimal solutions. Unfortunately, previous attempts to use external memory in the form of binary memory have produced poor results in practice. Here, we investigate alternative forms of memory in support of learning effective memoryless policies. Our novel forms of memory outperform binary and LSTM-based memory in well-established partially observable domains.

Link to the paper.

Be Considerate: Objectives, Side Effects, and Deciding How to Act
by P. Alizadeh Alamdari, T. Q. Klassen, R. Toro Icarte, and S. A. McIlraith.

Abstract: Recent work in AI safety has highlighted that in sequential decision making, objectives are often underspecified or incomplete. This gives discretion to the acting agent to realize the stated objective in ways that may result in undesirable outcomes. We contend that to learn to act safely, a reinforcement learning (RL) agent should include contemplation of the impact of its actions on the wellbeing and agency of others in the environment, including other acting agents and reactive processes. We endow RL agents with the ability to contemplate such impact by augmenting their reward based on expectation of future return by others in the environment, providing different criteria for characterizing impact. We further endow these agents with the ability to differentially factor this impact into their decision making, manifesting behavior that ranges from self-centred to self-less, as demonstrated by experiments in gridworld environments.

Link to the paper.


(ICML21) LTL2Action: Generalizing LTL Instructions for Multi-Task RL
by P. Vaezipoor, A. Li, R. Toro Icarte, and S. A. McIlraith.

Abstract: We address the problem of teaching a deep reinforcement learning (RL) agent to follow instructions in multi-task environments. We employ a well-known formal language -- linear temporal logic (LTL) -- to specify instructions, using a domain-specific vocabulary. We propose a novel approach to learning that exploits the compositional syntax and the semantics of LTL, enabling our RL agent to learn task-conditioned policies that generalize to new instructions, not observed during training. The expressive power of LTL supports the specification of a diversity of complex temporally extended behaviours that include conditionals and alternative realizations. Experiments on discrete and continuous domains demonstrate the strength of our approach in learning to solve (unseen) tasks, given LTL instructions.

Link to the paper and code.

(AAAI21) Interpretable Sequence Classification via Discrete Optimization
by M. Shvo, A. Li, R. Toro Icarte, and S. A. McIlraith.

Abstract: Sequence classification is the task of predicting a class label given a sequence of observations. In many applications such as healthcare monitoring or intrusion detection, early classification is crucial to prompt intervention. In this work, we learn sequence classifiers that favour early classification from an evolving observation trace. While many state-of-the-art sequence classifiers are neural networks, and in particular LSTMs, our classifiers take the form of finite state automata and are learned via discrete optimization. Our automata-based classifiers are interpretable—supporting explanation, counterfactual reasoning, and human-in-the-loop modification—and have strong empirical performance. Experiments over a suite of goal recognition and behaviour classification datasets show our learned automata-based classifiers to have comparable test performance to LSTM-based classifiers, with the added advantage of being interpretable.

Link to the paper.

(CCAI21) AppBuddy: Learning to Accomplish Tasks in Mobile Apps via Reinforcement Learning
by M. Shvo, Z. Hu, R. Toro Icarte, I. Mohomed, A. Jepson, and S. A. McIlraith.

Abstract: Human beings, even small children, quickly become adept at figuring out how to use applications on their mobile devices. Learning to use a new app is often achieved via trial-and-error, accelerated by transfer of knowledge from past experiences with like apps. The prospect of building a smarter smartphone - one that can learn how to achieve tasks using mobile apps - is tantalizing. In this paper we explore the use of Reinforcement Learning (RL) with the goal of advancing this aspiration. We introduce an RL-based framework for learning to accomplish tasks in mobile apps. RL agents are provided with states derived from the underlying representation of on-screen elements, and rewards that are based on progress made in the task. Agents can interact with screen elements by tapping or typing. Our experimental results, over a number of mobile apps, show that RL agents can learn to accomplish multi-step tasks, as well as achieve modest generalization across different apps. More generally, we develop a platform which addresses several engineering challenges to enable an effective RL training environment. Our AppBuddy platform is compatible with OpenAI Gym and includes a suite of mobile apps and benchmark tasks that supports a diversity of RL research in the mobile app setting.

Link to the paper.

(ICAPS20) Symbolic Plans as High-Level Instructions for Reinforcement Learning
by L. Illanes, X. Yan, R. Toro Icarte, and S. A. McIlraith.

Abstract: Reinforcement learning (RL) agents seek to maximize the cumulative reward obtained when interacting with their environment. Users define tasks or goals for RL agents by designing specialized reward functions such that maximization aligns with task satisfaction. This work explores the use of high-level symbolic action models as a framework for defining final-state goal tasks and automatically producing their corresponding reward functions. We also show how automated planning can be used to synthesize high-level plans that can guide hierarchical RL (HRL) techniques towards efficiently learning adequate policies. We provide a formal characterization of taskable RL environments and describe sufficient conditions that guarantee we can satisfy various notions of optimality (e.g., minimize total cost, maximize probability of reaching the goal). In addition, we do an empirical evaluation that shows that our approach converges to near-optimal solutions faster than standard RL and HRL methods and that it provides an effective framework for transferring learned skills across multiple tasks in a given environment.

Link to the paper.

News: This paper won the best paper award at the KR2ML workshop at NeurIPS20 (sponsored by Amazon Science)! [paper].

Preliminary version: (RLDM19) Symbolic Planning and Model-Free Reinforcement Learning: Training Taskable Agents [paper, poster].

(NeurIPS19) Learning Reward Machines for Partially Observable Reinforcement Learning
by R. Toro Icarte, E. Waldie, T. Q. Klassen, R. Valenzano, M. P. Castro, and S. A. McIlraith.

Abstract: Reward Machines (RMs), originally proposed for specifying problems in Reinforcement Learning (RL), provide a structured, automata-based representation of a reward function that allows an agent to decompose problems into subproblems that can be efficiently learned using off-policy learning. Here we show that RMs can be learned from experience, instead of being specified by the user, and that the resulting problem decomposition can be used to effectively solve partially observable RL problems. We pose the task of learning RMs as a discrete optimization problem where the objective is to find an RM that decomposes the problem into a set of subproblems such that the combination of their optimal memoryless policies is an optimal policy for the original problem. We show the effectiveness of this approach on three partially observable domains, where it significantly outperforms A3C, PPO, and ACER, and discuss its advantages, limitations, and broader potential.

Link to the paper, code, slides, poster, and video.

News 1: Our work has been accepted for spotlight presentation at NeurIPS!
News 2: I presented this work at the Recently Published Research track at KR 2020 [slides].

Preliminary version: (RLDM19) Searching for Markovian Subproblems to Address Partially Observable Reinforcement Learning [paper, poster].

(CP19) Training Binarized Neural Networks using MIP and CP
by R. Toro Icarte, L. Illanes, M. P. Castro, A. Cire, S. A. McIlraith, and J. C. Beck.

Abstract: Binarized Neural Networks (BNNs) are an important class of neural network characterized by weights and activations restricted to the set {-1,+1}. BNNs provide simple compact descriptions and as such have a wide range of applications in low-power devices. In this paper, we investigate a model-based approach to training BNNs using constraint programming (CP), mixed-integer programming (MIP), and CP/MIP hybrids. We formulate the training problem as finding a set of weights that correctly classify the training set instances while optimizing objective functions that have been proposed in the literature as proxies for generalizability. Our experimental results on the MNIST digit recognition dataset suggest that—when training data is limited—the BNNs found by our hybrid approach generalize better than those obtained from a state-of-the-art gradient descent method. More broadly, this work enables the analysis of neural network performance based on the availability of optimal solutions and optimality bounds.

Link to the paper, slides, and code.

(IJCAI19) LTL and Beyond: Formal Languages for Reward Function Specification in Reinforcement Learning
by A. Camacho, R. Toro Icarte, T. Q. Klassen, R. Valenzano, and S. A. McIlraith.

Abstract: In Reinforcement Learning (RL), an agent is guided by the rewards it receives from the reward function. Unfortunately, it may take many interactions with the environment to learn from sparse rewards, and it can be challenging to specify reward functions that reflect complex reward-worthy behavior. We propose using reward machines (RMs), which are automata-based representations that expose reward function structure, as a normal form representation for reward functions. We show how specifications of reward in various formal languages, including LTL and other regular languages, can be automatically translated into RMs, easing the burden of complex reward function specification. We then show how the exposed structure of the reward function can be exploited by tailored q-learning algorithms and automated reward shaping techniques in order to improve the sample efficiency of reinforcement learning methods. Experiments show that these RM-tailored techniques significantly outperform state-of-the-art (deep) RL algorithms, solving problems that otherwise cannot reasonably be solved by existing approaches.

Link to the paper, poster, bibtex, and code.

(ICML18) Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning
by R. Toro Icarte, T. Q. Klassen, R. Valenzano, and S. A. McIlraith.

Abstract: In this paper we propose Reward Machines—a type of finite state machine that supports the specification of reward functions while exposing reward function structure to the learner and supporting decomposition. We then present Q-Learning for Reward Machines (QRM), an algorithm which appropriately decomposes the reward machine and uses off-policy q-learning to simultaneously learn subpolicies for the different components. QRM is guaranteed to converge to an optimal policy in the tabular case, in contrast to Hierarchical Reinforcement Learning methods which might converge to suboptimal policies. We demonstrate this behavior experimentally in two discrete domains. We also show how function approximation methods like neural networks can be incorporated into QRM, and that doing so can find better policies more quickly than hierarchical methods in a domain with a continuous state space.

Link to the paper, slides, poster, bibtex, code, and video.

News: We were selected to give a long talk at ICML!

(AAMAS18) Teaching Multiple Tasks to an RL Agent using LTL
by R. Toro Icarte, T. Q. Klassen, R. Valenzano, and S. A. McIlraith.

Abstract: This paper examines the problem of how to teach multiple tasks to an agent that learns using Reinforcement Learning (RL). To this end, we propose the use of Linear Temporal Logic (LTL) as a compelling language for teaching multiple tasks to an RL agent in a manner that supports composition of learned skills. We also propose a novel algorithm that exploits LTL progression and off-policy RL to speed up learning without compromising convergence guarantees. Experiments over randomly generated Minecraft-like grids illustrate our superior performance relative to the state of the art.

Link to the paper, slides, poster, bibtex, and code.

News: I presented this work at the learning by instruction workshop at NeurIPS 2018 [slides].

(CCAI18) Advice-Based Exploration in Model-Based Reinforcement Learning
by R. Toro Icarte, T. Q. Klassen, R. Valenzano, and S. A. McIlraith.

Abstract: A reason that current machine learning approaches still fall short of human performance in many areas may lie in the way problems are defined for them. When people try to reach their goals, they often are not limited to figuring out what to do based only on observing the environment, but can also make use of other things, like linguistically expressed advice from other people. We believe that human advice can also benefit artificial agents, and can play a major role in enabling them to solve harder problems.

Method: In this project, we look for ways to use advice when solving MDPs. The advice language includes temporal operators that allow us to recommend, for instance, to "get the key and then go to the door", "collect all the cookies", or "avoid nails and holes". We have experimented using Model-Free and Model-Based Reinforcement Learning. In the video, the agent (green dot) is able to escape from a maze by learning from experience and human advice. To escape from the maze, the agent has to collect the keys (yellow dots) to open the doors (brown square), while avoiding nails (gray dots) and holes (blue dots) and, hopefully, collecting cookies (brown dots).

Link to the paper and bibtex.

Preliminary version: (RLDM17) Using Advice in Model-Based Reinforcement Learning [paper, poster].

(IJCAI17) How a General-Purpose Commonsense Ontology can Improve Performance of Learning-Based Image Retrieval
by R. Toro Icarte, J. Baier, C. Ruz, and A. Soto.

Abstract: The knowledge representation community has invested great efforts in building general-purpose ontologies which contain large amounts of commonsense knowledge on various aspects of the world. Among the thousands of assertions contained in them, many express relations that can be regarded as relevant to visual inference; e.g., "a ball is used by a football player", "a tennis player is located at a tennis court". In general, current state-of-the-art approaches for visual recognition do not exploit these rule-based knowledge sources. In this project, we study the question of whether or not general-purpose ontologies—specifically, MIT’s ConceptNet ontology—could play a role in state-of-the-art vision systems.

Links: to the paper, slides, poster, bibtex, and code.

Other Computer Vision Projects