Reinforcement Learning

Decision-making using Markov decision models requires an accurate model of the dynamics of the system within which the agent operates. This information is often not available, especially in situations involving robotic agents in novel domains. For this reason a promising approach to programming robots is reinforcement learning, a method by which an agent learns to associate a value with the execution of actions in different states.

One important topic is the development of good model-based theories of RL that learn and use concise, natural models of system dynamics and reward, such as Bayesian networks. This allows good generalization of the learned value function and more efficient solution of the underlying MDP. Furthermore, it provides and ideal framework for incorporating prior knowledge into RL algorithms (how do you keep your robot from running over cliffs during training, anyway?). One of my students, Richard Dearden, is actively pursuing this topic.

An important application of RL is in the area of multiagent systems. In such a setting, RL can be used by agents to coordinate the actions of cooperative agents (the setting taht I amd primarily interested in). Furthermore, in such settings, we can imagine one agent learning how to act more quickly by observing the actions of another, more experienced agent, or agents that learn how to convey their knowledge and intentions more explicitly with their cohorts. This last, very general topics is being investigated by one of my students, Bob Price. For more details, see the project description Multiagent Systems .

Over the last couple of years I have been dabbling with the problem of getting mobile robots to learn to play soccer. Actually, some students have been doing most of the dabbling, while I observe and occasionally interject with some advice. These include Roger Ford (M.Sc in 1994), Weng-Keen Wong and Sam Heath. We have been using the Dynamite Testbed, developed by a number of here at people at the UBC Laboratory for Computational Intelligence, and consisting of a set of radio-controlled cars in an evironment where the attempt to strike a ball into a goal (and prevent the opposition from scoring into their goal). We have exploited the ``soccer-playing'' behaviors developed by Michael Sahota and used RL to coordinate these behaviors. The results have been somewhat successful. Our basic strategy has been to develop a small number of easily tested predicates that carve up the large dimensional, continuous input space into more manageable chunks (e.g., predicates such as "Am I closer to the ball than any opposing car?"). Of course, in retrospect, we probably should have used the RoboCup simulator in order to gauge our results in a more objective fashion!

Return to Craig's ProjectPage


Craig Boutilier, cebly@cs.ubc.ca