Pascal Poupart
Department of Computer Science
University of Toronto
Toronto, ON M5S 3H5
email: ppoupart@cs.toronto.edu
Craig Boutilier
Department of Computer Science
University of Toronto
Toronto, ON M5S 3H5
email: cebly@cs.toronto.edu
Abstract
Existing algorithms for discrete partially observable Markov decision
processes can at best solve problems of a few thousand states due to
two important sources of intractability: the curse of dimensionality and
the policy space complexity. This paper describes a new algorithm
(VDCBPI) that mitigates both sources of intractability by combining the
Value Directed Compression (VDC) technique [13] with Bounded Policy
Iteration (BPI) [14]. The scalability of VDCBPI is demonstrated on
synthetic network management problems with up to 33 million states.
To appear, NIPS-04
Return to List of Papers