Robert St-Aubin and Jesse Hoey
Department of Computer Science
University of British Columbia
Vancouver, BC, CANADA, V6T 1Z4
email: staubin,jhoey@cs.ubc.ca
Craig Boutilier
Department of Computer Science
University of Toronto
Toronto, ON M5S 3H5
email: cebly@cs.toronto.edu
Abstract
We propose a method of approximate dynamic programming
for Markov decision processes (MDPs) using algebraic decision
diagrams (ADDs). We produce near-optimal value functions and policies
with much lower time and space requirements than exact dynamic programming.
Our method reduces the sizes of the intermediate value functions
generated during value iteration
by replacing the values at the terminals of the ADD
with ranges of values. A representation using decision diagrams is
well-suited to this task, for similar values can be directly merged
without the need for reordering variables.
Our method is demonstrated on a class of large MDPS (with up
to 34 billion states), and we compare the results
with the optimal value functions.
To appear, NIPS-2000, Denver, Novermber, 2000
Return to List of Papers