Craig Boutilier and Richard Dearden
Department of Computer Science
University of British Columbia
Vancouver, BC, CANADA, V6T 1Z4
email: cebly@cs.ubc.ca, dearden@cs.ubc.ca
Abstract
We propose and examine a method of approximate dynamic programming
for Markov decision processes based on structured problem representations.
We assume an MDP is represented using a dynamic Bayesian network,
and construct value functions using decision trees as our function
representation. The size of the representation is kept within
acceptable limits by pruning these value trees so that leaves
represent possible ranges of values, thus approximating the
value functions produced during optimization. We propose a method
for detecting convergence, prove errors bounds on the resulting
approximately optimal value functions and policies, and describe
some preliminary experimental results.
Appeared, ICML-96
Return to List of Papers