Craig Boutilier
Department of Computer Science
University of British Columbia
Vancouver, BC, CANADA, V6T 1Z4
email: cebly@cs.ubc.ca
Martin L. Puterman
Faculty of Commerce
University of British Columbia
Vancouver, BC V6T 1Z4, CANADA
email: marty@markov.commerce.ubc.ca
Abstract
To date, AI planning research, decision-theoretic planning
included, has concentrated primarily on classical goal-oriented tasks.
We argue that many AI planning problems should be viewed as
process-oriented where the aim is to produce a policy or
behavior strategy with no termination condition in mind. While
Markov decision models have gained some prominence in planning recently,
their full power only becomes apparent with process-oriented
problems. The question of appropriate optimality criteria becomes
more critical in this case; we argue that
average-reward optimality is most
suitable. While construction of average-optimal policies
involves a number of subtleties and computational
difficulties, certain aspects of the problem can be solved
using compact action representations such as Bayes nets.
In particular, we provide an algorithm that identifies the
underlying structure of the Markov process underlying a planning
problem, a crucial element of constructing average optimal policies,
without explicit enumeration of the underlying state space.
Appeared IJCAI-95
Return to List of Papers