Title of article :
Concurrent Markov decision processes for robot team learning
Author/Authors :
Girard، نويسنده , , Justin and Reza Emami، نويسنده , , M.، نويسنده ,
Abstract :
Multi-agent learning, in a decision theoretic sense, may run into deficiencies if a single Markov decision process (MDP) is used to model agent behaviour. This paper discusses an approach to overcoming such deficiencies by considering a multi-agent learning problem as a concurrence between individual learning and task allocation MDPs. This approach, called Concurrent MDP (CMDP), is contrasted with other MDP models, including decentralized MDP. The individual MDP problem is solved by a Q-Learning algorithm, guaranteed to settle on a locally optimal reward maximization policy. For the task allocation MDP, several different concurrent individual and social learning solutions are considered. Through a heterogeneous team foraging case study, it is shown that the CMDP-based learning mechanisms reduce both simulation time and total agent learning effort.
Keywords :
robot team , Heterogeneous team , reinforcement learning , Markov decision process , multi-agent learning
Journal title :
Astroparticle Physics