Craig Boutilier
Department of Computer Science
University of Toronto
Toronto, ON, CANADA, M5S 3H5
email: cebly@cs.toronto.edu
Richard Dearden
Department of Computer Science
University of British Columbia
Vancouver, BC, CANADA, V6T 1Z4
email: cebly@cs.ubc.ca, dearden@cs.ubc.ca
Moises Goldszmidt
Computer Science Department
Stanford University
Stanford, CA 94305-9010, U.S.A.
email: moises@robotics.stanford.edu
Abstract
Markov decision processes (MDPs) have proven to be popular models
for decision-theoretic planning, but standard dynamic programming
algorithms for solving MDPs rely on explicit, state-based
specifications and computations. To alleviate the combinatorial
problems associated with such methods, we propose new representational
and computational techniques for MDPs that exploit certain types of
problem structure. We use dynamic Bayesian networks (with decision trees
representing the local families of conditional probability distributions)
to represent stochastic actions in an MDP, together with a decision-tree
representation of rewards. Based on this representation, we develop
versions of standard dynamic programming algorithms that directly
manipulate decision-tree representations of policies
and value functions. This generally obviates the need for state-by-state
computation, aggregating states at the leaves of these trees and requiring
computations only for each aggregate state. The key to these algorithms
is a decision-theoretic generalization of classic regression analysis,
in which we determine the features relevant to predicting expected
value. We demonstrate the method empirically on several
planning problems, showing significant savings for certain types of
problems. We also identify certain classes of problems for
which this technique fails to perform well and suggest extensions
and related ideas that may prove useful for such problems.
We also briefly describe an approximation scheme based on this
approach.
Artificial Intelligence 121 (2000)
Return to List of Papers