Piecewise Linear Value Function Approximation for Factored MDPs

Pascal Poupart
Department of Computer Science
University of Toronto
Toronto, ON M5S 3H5
email: ppoupart@cs.toronto.edu

Craig Boutilier
Department of Computer Science
University of Toronto
Toronto, ON M5S 3H5
email: cebly@cs.toronto.edu

Relu Patrascu
Dept. of Computer Science
University of Waterloo
Waterloo, ON, N2L 3G1
email: rpatrasc@cs.uwaterloo.ca

Dale Schuurmans
Dept. of Computer Science
University of Waterloo
Waterloo, ON, N2L 3G1
email: dale@cs.uwaterloo.ca

Abstract
A number of proposals have been put forth in recent years for the solution of Markov decision processes (MDPs) whose state (and sometimes action) spaces are factored. One recent class of methods involves linear value function approximation, where the optimal value function is assumed to be a linear combination of some set of basis functions, with the aim of finding suitable weights. While sophisticated techniques have been developed for finding the best approximation within this constrained space, few methods have been proposed for choosing a suitable basis set, or modifying it if solution quality is found wanting. We propose a general framework, and specific proposals, that address both of these questions. In particular, we examine weakly coupled MDPs where a number of subtasks can be viewed independently modulo resource constraints. We then describe methods for constructing a piecewise linear combination of the subtask value functions, using greedy decision tree techniques. We argue that this architecture is suitable for many types of MDPs whose combinatorics are determined largely by the existence multiple conflicting objectives.

To appear, AAAI-02

Return to List of Papers