Title :
Timesharing-tracking: A new framework for decentralized reinforcement learning in cooperative multi-agent systems
Author :
Fu Bo ; Chen Xin ; He Yong ; Wu Min
Author_Institution :
Sch. of Inf. Sci. & Eng., Central South Univ., Changsha, China
Abstract :
The paper discusses how to learn the optimal cooperative policy in a decentralized way with known immediately individual reward. We propose a timesharing-tracking framework (TTF), in which agents learn their optimal policies alternatively on different states, in order to realize macroscopic simultaneous learning. Then the algorithm of the joint state Q-learning with best-response (BRQ-learning) to companions is proposed. Further, the BRQ-learning algorithm is extended into the TTF, so that the mechanism named multi-agent BRQ-learning with timesharing-tracking (BRQL-TT) is proposed to achieve optimal group policy. The simulation results illustrate that the proposed algorithm can learn the optimal joint behavior with less computation and faster speed comparing with other two classical learning algorithms.
Keywords :
learning (artificial intelligence); multi-agent systems; BRQ-learning with timesharing-tracking; BRQL-TT; TTF; best-response; cooperative multiagent systems; decentralized reinforcement learning; joint state Q-learning; macroscopic simultaneous learning; optimal cooperative policy; timesharing-tracking framework; Games; Joints; Learning (artificial intelligence); Multi-agent systems; Optimization; Robots; Switches; Cooperative learning; Immediately individual reward; Multi-agent system; Timesharing Tracking;
Conference_Titel :
Control Conference (CCC), 2013 32nd Chinese
Conference_Location :
Xi´an