Georgios Chalkiadakis
Department of Computer Science
University of Toronto
Toronto, ON M5S 3H5
email: gehalk@cs.toronto.edu
Craig Boutilier
Department of Computer Science
University of Toronto
Toronto, ON M5S 3H5
email: cebly@cs.toronto.edu
Abstract
Much emphasis in multiagent reinforcement learning (MARL)
research is placed on ensuring that MARL algorithms (eventually)
converge to desirable equilibria. As in standard reinforcement
learning, convergence generally requires sufficient exploration
of strategy space. However, exploration often comes at a price
in the form of penalties or foregone opportunities. In multiagent
settings, the problem is exacerbated by the need for agents to
``coordinate'' their policies on equilibria. We propose a Bayesian
model for optimal exploration in MARL problems that allows these
exploration costs to be weighed against their expected benefits using
the notion of value of information. Unlike standard RL models, this
model requires reasoning about how one's actions will influence the
behavior of other agents. We develop tractable approximations to
optimal Bayesian exploration, and report on experiments illustrating
the benefits of this approach in identical interest games.
To appear, AAMAS-03
Return to List of Papers