APRICODD: Approximate Policy Construction Using Decision Diagrams

Robert St-Aubin and Jesse Hoey
Department of Computer Science
University of British Columbia
Vancouver, BC, CANADA, V6T 1Z4
email: staubin,jhoey@cs.ubc.ca

Craig Boutilier
Department of Computer Science
University of Toronto
Toronto, ON M5S 3H5
email: cebly@cs.toronto.edu

We propose a method of approximate dynamic programming for Markov decision processes (MDPs) using algebraic decision diagrams (ADDs). We produce near-optimal value functions and policies with much lower time and space requirements than exact dynamic programming. Our method reduces the sizes of the intermediate value functions generated during value iteration by replacing the values at the terminals of the ADD with ranges of values. A representation using decision diagrams is well-suited to this task, for similar values can be directly merged without the need for reordering variables. Our method is demonstrated on a class of large MDPS (with up to 34 billion states), and we compare the results with the optimal value functions.

To appear, NIPS-2000, Denver, Novermber, 2000

Return to List of Papers