Exploiting Structure in Policy Construction

Craig Boutilier and Richard Dearden
Department of Computer Science
University of British Columbia
Vancouver, BC, CANADA, V6T 1Z4
email: cebly@cs.ubc.ca, dearden@cs.ubc.ca

Moises Goldszmidt
Rockwell Science Center
444 High Street
Palo Alto, CA 94301, U.S.A.
email: moises@rpal.rockwell.com

Markov decision processes (MDPs) have recently been applied to the problem of modeling decision-theoretic planning. While such traditional methods for solving MDPs are often practical for small states spaces, their effectiveness for large AI planning problems is questionable. We present an algorithm, called structured policy iteration (SPI), that constructs optimal policies without explicit enumeration of the state space. The algorithm retains the fundamental computational steps of the commonly used modified policy iteration algorithm, but exploits the variable and propositional independencies in reflected in a temporal Bayesian network representation of MDPs. The principles behind SPI can be applied to any structured representation of stochastic actions, and the algorithm itself can be used in conjunction with recent approximation methods.

Appeared IJCAI-95 (see also AAAI Spring Symposium on Extending Theories of Action, Stanford, March 1995).

Return to List of Papers