Craig Boutilier and Richard Dearden
Department of Computer Science
University of British Columbia
Vancouver, BC, CANADA, V6T 1Z4
email: cebly@cs.ubc.ca, dearden@cs.ubc.ca
Moises Goldszmidt
Rockwell Science Center
444 High Street
Palo Alto, CA 94301, U.S.A.
email: moises@rpal.rockwell.com
Abstract
Markov decision processes (MDPs) have recently been applied to the
problem of modeling decision-theoretic planning.
While such traditional methods for solving MDPs are
often practical for small states spaces, their effectiveness
for large AI planning problems is questionable.
We present an algorithm, called structured policy iteration (SPI),
that constructs optimal policies without explicit enumeration of the
state space. The algorithm retains the fundamental computational
steps of the commonly used modified policy iteration algorithm,
but exploits the variable and
propositional independencies in reflected in
a temporal Bayesian network representation of MDPs.
The principles behind SPI can be applied to any structured
representation of stochastic actions, and the algorithm itself
can be used in conjunction with recent approximation methods.
Appeared IJCAI-95 (see also AAAI Spring Symposium on Extending Theories of Action, Stanford, March 1995).
Return to List of Papers