Title :
CLEAN Rewards to Improve Coordination by Removing Exploratory Action Noise
Author :
Holmesparker, Chris ; Taylor, Matthew E. ; Agogino, Adrian K. ; Tumer, Kagan
Author_Institution :
Parflux LLC, Salem, OR, USA
Abstract :
Coordinating the joint-actions of agents in cooperative multiagent systems is a difficult problem in many real world domains. Learning in such multiagent systems can be slow because an agent may not only need to learn how to behave in a complex environment, but also to account for the actions of other learning agents. The inability of an agent to distinguish between the true environmental dynamics and those caused by the stochastic exploratory actions of other agents creates noise in each agent´s reward signal. This learning noise can have unforeseen and often undesirable effects on the resultant system performance. We define such noise as exploratory action noise, demonstrate the critical impact it can have on the learning process in multiagent settings, and introduce a reward structure to effectively remove such noise from each agent´s reward signal. In particular, we introduce two types of Coordinated Learning without Exploratory Action Noise (CLEAN) rewards that allow an agent to estimate the counterfactual reward it would have received had it taken an alternative action. We empirically show that CLEAN rewards outperform agents using both traditional global rewards and shaped difference rewards in two domains.
Keywords :
learning (artificial intelligence); multi-agent systems; signal denoising; CLEAN rewards; agent joint-actions coordination; agent reward signal; cooperative multiagent systems; coordinated learning without exploratory action noise; counterfactual reward; critical impact; exploratory action noise removal; learning agents; learning noise; learning process; reward structure; stochastic exploratory actions; system performance; true environmental dynamics; Equations; Joints; Learning (artificial intelligence); Mathematical model; Multi-agent systems; Noise; System performance; MAV communication network; UAV communication network; UAV coordination; multiagent UAV; multiagent coordination; multiagent learning; reward shaping; shaped rewards;
Conference_Titel :
Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on
Conference_Location :
Warsaw
DOI :
10.1109/WI-IAT.2014.159