Darius Braziunas
Department of Computer Science
University of Toronto
Toronto, ON M5S 3H5
email: darius@cs.toronto.edu
Craig Boutilier
Department of Computer Science
University of Toronto
Toronto, ON M5S 3H5
email: cebly@cs.toronto.edu
Abstract
The search for finite-state controllers for partially observable
Markov decision processes (POMDPs) is often based on approaches like
gradient ascent, attractive because of their relatively low
computational cost. In this paper, we illustrate a basic problem
with gradient-based methods applied to POMDPs, where the sequential
nature of the decision problem is at issue, and propose a new
stochastic local search method as an alternative. The heuristics
used in our procedure mimic the sequential reasoning inherent in
optimal dynamic programming (DP) approaches. We show that our
algorithm consistently finds higher quality controllers than
gradient ascent, and is competitive with (and, for some problems,
superior to) other state-of-the-art controller and DP-based
algorithms on large-scale POMDPs.
To appear, AAAI-04
Return to List of Papers