Pascal Poupart
Department of Computer Science
University of Toronto
Toronto, ON M5S 3H5
email: ppoupart@cs.toronto.edu
Craig Boutilier
Department of Computer Science
University of Toronto
Toronto, ON M5S 3H5
email: cebly@cs.toronto.edu
Relu Patrascu
Dept. of Computer Science
University of Waterloo
Waterloo, ON, N2L 3G1
email: rpatrasc@cs.uwaterloo.ca
Dale Schuurmans
Dept. of Computer Science
University of Waterloo
Waterloo, ON, N2L 3G1
email: dale@cs.uwaterloo.ca
Abstract
A number of proposals have been put forth in recent years for the solution of
Markov decision processes (MDPs) whose state (and sometimes action) spaces are
factored. One recent class of methods involves linear value function
approximation, where the optimal value function is assumed to be a linear
combination of some set of basis functions, with the aim of finding suitable
weights. While sophisticated techniques have been developed for finding the
best approximation within this constrained space, few methods have been
proposed for choosing a suitable basis set, or modifying it if solution quality
is found wanting. We propose a general framework, and specific proposals, that
address both of these questions. In particular, we examine weakly coupled
MDPs where a number of subtasks can be viewed independently modulo resource
constraints. We then describe methods for constructing a piecewise linear
combination of the subtask value functions, using greedy decision tree
techniques. We argue that this architecture is suitable for many
types of MDPs whose combinatorics are determined largely by the existence
multiple conflicting objectives.
To appear, AAAI-02
Return to List of Papers