Pascal Poupart
Department of Computer Science
University of Toronto
Toronto, ON M5S 3H5
email: ppoupart@cs.toronto.edu
Craig Boutilier
Department of Computer Science
University of Toronto
Toronto, ON M5S 3H5
email: cebly@cs.toronto.edu
Abstract
We describe a new approximation algorithm for solving partially observable
MDPs. Our
bounded policy iteration approach searches through the
space of bounded-size, stochastic finite state controllers, combining several
advantages of gradient ascent (efficiency, search through restricted
controller space) and policy iteration (less vulnerability to local optima).
To appear, NIPS-03
Return to List of Papers