Title: Multi-Agent Reinforcement Learning with Vicarious Rewards
Authors: Kevin Irwig and Wayne Wobcke
Series: Linkping Electronic Articles in Computer and Information Science
ISSN 1401-9841
Issue: Vol. 4 (1999), No. 034
URL: http://www.ep.liu.se/ea/cis/1999/034/


Reinforcement learning is the problem faced by an agent that must learn behaviour through trial-and-error interactions with a dynamicenvironment. In a multi-agent setting, the problem is often further complicated by the need to take into account the behaviour of otheragents in order to learn  to  perform effectively. Issues of coordination and cooperation must be addressed; in general, it is not sufficient for each agent to act selfishly in order to arrive at a globally optimal strategy. In this work, we apply the AdaptiveHeuristic Critic (AHC) and Q-learning algorithms to agents in a simple artificial multi-agent domain based on the Tileworld.We experimentally compare the performance of the AHC and Q-learning algorithms to each other as well as to a hand-coded greedy strategy.  The overall result is that AHC agents perform better than the others, particularly when many other agents are present or the world is dynamic.  We also examine the notion of global optimality in this system, and present a simple method of encouraging agents to learn cooperative behaviour, which we call vicarious reinforcement. The main result of this work is that agents that receive additional vicarious reinforcement perform better than selfish agents, even though the task being performed here is not inherently cooperative.

Original publication 1999-12-30 Postscript part I -- Checksum
Checksum (old) Information about recalculation of checksum
Postscript part II -- Checksum II
Checksum II
(old) Information about recalculation of checksum