Scott Sanner
Department of Computer Science
University of Toronto
Toronto, ON M5S 3H5
email: ssanner@cs.toronto.edu
Craig Boutilier
Department of Computer Science
University of Toronto
Toronto, ON M5S 3H5
email: cebly@cs.toronto.edu
Abstract
Recent work on approximate linear programming (ALP) techniques for
first-order Markov Decision Processes (FOMDPs) represents the value
function linearly w.r.t. a set of first-order basis functions and uses
linear programming techniques to determine suitable weights. This
approach offers the advantage that it does not require simplification
of the first-order value function, and allows one to solve FOMDPs
independent of a specific domain instantiation. In this paper, we
address several questions to enhance the applicability of this work:
(1) Can we extend the first-order ALP framework to approximate policy
iteration and if so, how do these two algorithms compare?
(2) Can we automatically generate basis functions and evaluate their
impact on value function quality? (3) How can we decompose
intractable problems with universally quantified rewards into
tractable subproblems? We propose answers to these questions along
with a number of novel optimizations and provide a comparative
empirical evaluation on problems from the ICAPS 2004
Probabilistic Planning Competition.
To appear, UAI-06
Return to List of Papers