Title :
Diffusion gradient temporal difference for cooperative reinforcement learning with linear function approximation
Author :
Valcarcel Macua, S. ; Belanovic, P. ; Zazo, S.
Author_Institution :
Escuela Tec. Super. de Ing. de Telecomun., Univ. Politec. de Madrid, Madrid, Spain
Abstract :
We introduce a diffusion-based algorithm in which multiple agents cooperate to predict a common and global state-value function by sharing local estimates and local gradient information among neighbors. Our algorithm is a fully distributed implementation of the gradient temporal difference with linear function approximation, to make it applicable to multiagent settings. Simulations illustrate the benefit of cooperation in learning, as made possible by the proposed algorithm.
Keywords :
approximation theory; gradient methods; learning (artificial intelligence); cooperative reinforcement learning; diffusion based algorithm; diffusion gradient temporal difference; distributed implementation; global state-value function; linear function approximation; local gradient information; multiple agents; Approximation algorithms; Cost function; Function approximation; Learning; Prediction algorithms; Vectors; TD; cooperative learning; distributed control; distributed decision making; distributed reinforcement learning; distributed temporal difference; multiagent;
Conference_Titel :
Cognitive Information Processing (CIP), 2012 3rd International Workshop on
Conference_Location :
Baiona
Print_ISBN :
978-1-4673-1877-8
DOI :
10.1109/CIP.2012.6232901