Artificial Intelligence Seminar

Tuesday October 11, 4:10 am

PT 266 (tentative)

Reinforcement learning with partial programs

Stuart Russell

Computer Science Division, UC Berkeley

Hierarchical reinforcement learning is based on the idea that effective behaviour in complex environments requires some form of hierarchical structure. Such structure simplifies the process of learning complete behaviors. It also allows the identification of reduced sets of state variables that are relevant to decisions made within each ``subroutine'' of the hierarchical structure, thereby providing a temporal decomposition of global value functions into small additive components. Guidance as to the correct hierarchical structure can be supplied to an agent in the form of a *partial program* in which choices may be left unspecified. ALisp is a partial programming language - an extension of Lisp - that includes reinforcement learning algorithms that converge to the optimal complete program consistent with the given partial program. ALisp includes constructs for specifying asynchronous concurrent activities, which provides further opportunities for decomposing value functions across functional as well as temporal components of behavior. These features enable ALisp agents to learn intelligent behaviors of surprising complexity.

[Joint work with Ron Parr, David Andre, and Bhaskara Marthi]