###
Greedy Linear Value Function Approximation for Factored Markov Decision
Processes

**Pascal Poupart**

Department of Computer Science

University of Toronto

Toronto, ON M5S 3H5

*email:* `ppoupart@cs.toronto.edu`

**Relu Patrascu **

Dept. of Computer Science

University of Waterloo

Waterloo, ON, N2L 3G1

*email:* `rpatrasc@cs.uwaterloo.ca`

**Dale Schuurmans **

Dept. of Computer Science

University of Waterloo

Waterloo, ON, N2L 3G1

*email:* `dale@cs.uwaterloo.ca`

**Craig Boutilier **

Department of Computer Science

University of Toronto

Toronto, ON M5S 3H5

*email:* `cebly@cs.toronto.edu`

**Carlos Guestrin **

Dept. of Computer Science

Stanford University

Stanford, CA 94305-9010

*email:* `guestrin@stanford.edu`

**Abstract**

Significant recent work has focused on using linear representations to
approximate value functions for factored Markov decision processes (MDPs).
Current research has adopted linear programming as an effective
means to calculate approximations for a *given* set of basis functions,
tackling very large MDPs as a result.
However, a number of issues remain unresolved:

- How accurate are the approximations produced by linear programs?
- How hard is it to produce better approximations?
and
- Where do the basis functions come from?

To address these questions,
we first investigate the complexity of minimizing the Bellman error
of a linear value function approximation---showing that this is an
inherently hard problem.
Nevertheless, we provide a branch and bound method for
calculating Bellman error and performing
approximate policy iteration for general factored MDPs.
These methods are more accurate than linear programming, but more expensive.
We then consider linear programming itself and investigate methods for
automatically constructing sets of basis functions that allow
this approach to produce good approximations.
The techniques we develop are guaranteed to reduce L_{1} error,
but also empirically reduce Bellman error.
Although L_{1} is a weak error measure for value function approximation,
our results show that linear programming still yields competitive
approximations,
and our methods tend to build basis functions that reveal the underlying
decompositional structure of the domain.
* To appear, AAAI-02*

Return to
List of Papers