Craig Boutilier, Ray Reiter, Bob Price
Department of Computer Science
University of Toronto
Toronto, ON M5S 3H5
email: cebly,reiter,bprice@cs.toronto.edu
Abstract
We present a dynamic programming approach for the solution of
first-order Markov decisions processes. This technique uses
an MDP whose dynamics is represented in a variant of the situation
calculus allowing for stochastic actions.
It produces a
logical description of the optimal value function and policy
by constructing a set of first-order formulae that minimally
partition state space according to distinctions made by the value
function and policy. This is achieved through the use of an operation
known as decision-theoretic regression. In effect, our algorithm
performs value iteration without explicit enumeration of either the
state or action spaces of the MDP. This allows problems involving
relational fluents and quantification to be solved without requiring
explicit state space enumeration or conversion to propositional form.
To appear, IJCAI-01
Return to List of Papers