Bob Price and Craig Boutilier
Department of Computer Science
University of British Columbia
Vancouver, BC, CANADA, V6T 1Z4
email: {price,cebly}@cs.ubc.ca
Abstract
Imitation is actively being studied as an effective means of learning in
multi-agent environments. It allows an agent to learn how to act
well (perhaps optimally)
by observing the actions of cooperative teachers or more
experienced agents. We propose a straightforward imitation mechanism
called model extraction that can be integrated easily into
standard model-based reinforcement learning algorithms. Roughly, by
observing a mentor with similar capabilities, an agent can extract
information about its own capabilities in unvisited parts of state
space. The extracted information can accelerate learning
dramatically. We illustrate the benefits of model extraction by
integrating it with prioritized sweeping, and demonstrating improved
performance and convergence through observation of single and multiple
mentors. Though we make some stringent assumptions regarding
observability, possible interactions and common abilities, we briefly
comment on extensions to the model that relax these.
To appear, ICML-99
Return to List of Papers