Average Reward Abstract

Process-Oriented Planning and Average-Reward Optimality

Craig Boutilier
Department of Computer Science
University of British Columbia
Vancouver, BC, CANADA, V6T 1Z4
email: cebly@cs.ubc.ca

Martin L. Puterman
Faculty of Commerce
University of British Columbia
Vancouver, BC V6T 1Z4, CANADA
email: marty@markov.commerce.ubc.ca

Abstract
To date, AI planning research, decision-theoretic planning included, has concentrated primarily on classical goal-oriented tasks. We argue that many AI planning problems should be viewed as process-oriented where the aim is to produce a policy or behavior strategy with no termination condition in mind. While Markov decision models have gained some prominence in planning recently, their full power only becomes apparent with process-oriented problems. The question of appropriate optimality criteria becomes more critical in this case; we argue that average-reward optimality is most suitable. While construction of average-optimal policies involves a number of subtleties and computational difficulties, certain aspects of the problem can be solved using compact action representations such as Bayes nets. In particular, we provide an algorithm that identifies the underlying structure of the Markov process underlying a planning problem, a crucial element of constructing average optimal policies, without explicit enumeration of the underlying state space.

Appeared IJCAI-95

Return to List of Papers