Title of article

Concurrent Markov decision processes for robot team learning

Author/Authors

Girard، نويسنده , , Justin and Reza Emami، نويسنده , , M.، نويسنده ,

Pages

From page

223

To page

234

Abstract

Multi-agent learning, in a decision theoretic sense, may run into deficiencies if a single Markov decision process (MDP) is used to model agent behaviour. This paper discusses an approach to overcoming such deficiencies by considering a multi-agent learning problem as a concurrence between individual learning and task allocation MDPs. This approach, called Concurrent MDP (CMDP), is contrasted with other MDP models, including decentralized MDP. The individual MDP problem is solved by a Q-Learning algorithm, guaranteed to settle on a locally optimal reward maximization policy. For the task allocation MDP, several different concurrent individual and social learning solutions are considered. Through a heterogeneous team foraging case study, it is shown that the CMDP-based learning mechanisms reduce both simulation time and total agent learning effort.

Keywords

robot team , Heterogeneous team , reinforcement learning , Markov decision process , multi-agent learning

Journal title

Astroparticle Physics

Record number

2048658

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=10&DC=2048658