مرکز منطقه ای اطلاع رساني علوم و فناوري - Distributed Policy Evaluation Under Multiple Behavior Strategies

DocumentCode :

32151

Title :

Distributed Policy Evaluation Under Multiple Behavior Strategies

Author :

Valcarcel Macua, Sergio ; Jianshu Chen ; Zazo, Santiago ; Sayed, Ali H.

Author_Institution :

Dept. of Signals, Syst. & Radiocommun., Univ. Politec. de Madrid, Madrid, Spain

Volume :

Issue :

fYear :

2015

fDate :

May-15

Firstpage :

1260

Lastpage :

1274

Abstract :

We apply diffusion strategies to develop a fully-distributed cooperative reinforcement learning algorithm in which agents in a network communicate only with their immediate neighbors to improve predictions about their environment. The algorithm can also be applied to off-policy learning, meaning that the agents can predict the response to a behavior different from the actual policies they are following. The proposed distributed strategy is efficient, with linear complexity in both computation time and memory footprint. We provide a mean-square-error performance analysis and establish convergence under constant step-size updates, which endow the network with continuous learning capabilities. The results show a clear gain from cooperation: when the individual agents can estimate the solution, cooperation increases stability and reduces bias and variance of the prediction error; but, more importantly, the network is able to approach the optimal solution even when none of the individual agents can (e.g., when the individual behavior policies restrict each agent to sample a small portion of the state space).

Keywords :

computational complexity; learning (artificial intelligence); mean square error methods; computation time; continuous learning capabilities; distributed policy evaluation; fully-distributed cooperative reinforcement learning algorithm; linear complexity; mean-square-error performance analysis; memory footprint; off-policy learning; Approximation algorithms; Equations; Linear approximation; Markov processes; Prediction algorithms; Vectors; Adaptive networks; Arrow-Hurwicz algorithm; diffusion strategies; distributed processing; gradient temporal difference; mean-square-error; reinforcement learning; saddle-point problem; saddlepoint problem;

fLanguage :

English

Journal_Title :

Automatic Control, IEEE Transactions on

Publisher :

ieee

ISSN :

0018-9286

Type :

jour

DOI :

10.1109/TAC.2014.2368731

Filename :

6949624

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=32151