Symbolic Plans as High-Level Instructions for Reinforcement Learning

Abstract

Reinforcement Learning (RL) agents seek to maximize the cumulative reward obtained when interacting with their environment. When this reward signal is sparsely distributed---as is the case for final-state goals---it may take a very large number of interactions before the agent learns an adequate policy. Some modern RL approaches address this issue by directly providing the agent with high-level instructions or by specifying reward functions that implicitly consider such instructions. In this work, we explore the use of high-level symbolic action models and Automated Planning techniques in order to automatically synthesize high-level instructions. We show how high-level plans can be exploited in a Hierarchical RL (HRL) setting, and do an empirical evaluation over multiple sets of final-state goal tasks. Our results show that our approach converges to near-optimal solutions much faster than standard RL and HRL techniques and that it provides an effective framework for transferring learned skills across multiple tasks in a given environment.

Abstract

Presentations