DocumentCode :
32151
Title :
Distributed Policy Evaluation Under Multiple Behavior Strategies
Author :
Valcarcel Macua, Sergio ; Jianshu Chen ; Zazo, Santiago ; Sayed, Ali H.
Author_Institution :
Dept. of Signals, Syst. & Radiocommun., Univ. Politec. de Madrid, Madrid, Spain
Volume :
60
Issue :
5
fYear :
2015
fDate :
May-15
Firstpage :
1260
Lastpage :
1274
Abstract :
We apply diffusion strategies to develop a fully-distributed cooperative reinforcement learning algorithm in which agents in a network communicate only with their immediate neighbors to improve predictions about their environment. The algorithm can also be applied to off-policy learning, meaning that the agents can predict the response to a behavior different from the actual policies they are following. The proposed distributed strategy is efficient, with linear complexity in both computation time and memory footprint. We provide a mean-square-error performance analysis and establish convergence under constant step-size updates, which endow the network with continuous learning capabilities. The results show a clear gain from cooperation: when the individual agents can estimate the solution, cooperation increases stability and reduces bias and variance of the prediction error; but, more importantly, the network is able to approach the optimal solution even when none of the individual agents can (e.g., when the individual behavior policies restrict each agent to sample a small portion of the state space).
Keywords :
computational complexity; learning (artificial intelligence); mean square error methods; computation time; continuous learning capabilities; distributed policy evaluation; fully-distributed cooperative reinforcement learning algorithm; linear complexity; mean-square-error performance analysis; memory footprint; off-policy learning; Approximation algorithms; Equations; Linear approximation; Markov processes; Prediction algorithms; Vectors; Adaptive networks; Arrow-Hurwicz algorithm; diffusion strategies; distributed processing; gradient temporal difference; mean-square-error; reinforcement learning; saddle-point problem; saddlepoint problem;
fLanguage :
English
Journal_Title :
Automatic Control, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9286
Type :
jour
DOI :
10.1109/TAC.2014.2368731
Filename :
6949624
Link To Document :
بازگشت