Greedy Value Function Approximation for Factored MDPs Abstract

Greedy Linear Value Function Approximation for Factored Markov Decision Processes

Pascal Poupart
Department of Computer Science
University of Toronto
Toronto, ON M5S 3H5
email: ppoupart@cs.toronto.edu

Relu Patrascu
Dept. of Computer Science
University of Waterloo
Waterloo, ON, N2L 3G1
email: rpatrasc@cs.uwaterloo.ca

Dale Schuurmans
Dept. of Computer Science
University of Waterloo
Waterloo, ON, N2L 3G1
email: dale@cs.uwaterloo.ca

Craig Boutilier
Department of Computer Science
University of Toronto
Toronto, ON M5S 3H5
email: cebly@cs.toronto.edu

Carlos Guestrin
Dept. of Computer Science
Stanford University
Stanford, CA 94305-9010
email: guestrin@stanford.edu

Abstract
Significant recent work has focused on using linear representations to approximate value functions for factored Markov decision processes (MDPs). Current research has adopted linear programming as an effective means to calculate approximations for a given set of basis functions, tackling very large MDPs as a result. However, a number of issues remain unresolved:

How accurate are the approximations produced by linear programs?
How hard is it to produce better approximations? and
Where do the basis functions come from?

To address these questions, we first investigate the complexity of minimizing the Bellman error of a linear value function approximation---showing that this is an inherently hard problem. Nevertheless, we provide a branch and bound method for calculating Bellman error and performing approximate policy iteration for general factored MDPs. These methods are more accurate than linear programming, but more expensive. We then consider linear programming itself and investigate methods for automatically constructing sets of basis functions that allow this approach to produce good approximations. The techniques we develop are guaranteed to reduce L₁ error, but also empirically reduce Bellman error. Although L₁ is a weak error measure for value function approximation, our results show that linear programming still yields competitive approximations, and our methods tend to build basis functions that reveal the underlying decompositional structure of the domain.

To appear, AAAI-02

Return to List of Papers