مرکز منطقه ای اطلاع رساني علوم و فناوري - Accelerated gradient temporal difference learning algorithms

DocumentCode :

1799303

Title :

Accelerated gradient temporal difference learning algorithms

Author :

Meyer, David ; Degenne, Remy ; Omrane, Ahmed ; Hao Shen

Author_Institution :

Inst. for Data Process., Tech. Univ. Munchen, Munich, Germany

fYear :

2014

fDate :

9-12 Dec. 2014

Firstpage :

Lastpage :

Abstract :

In this paper we study Temporal Difference (TD) Learning with linear value function approximation. The classic TD algorithm is known to be unstable with linear function approximation and off-policy learning. Recently developed Gradient TD (GTD) algorithms have addressed this problem successfully. Despite their prominent properties of good scalability and convergence to correct solutions, they inherit the potential weakness of slow convergence as they are a stochastic gradient descent algorithm. Accelerated stochastic gradient descent algorithms have been developed to speed up convergence, while still keeping computational complexity low. In this work, we develop an accelerated stochastic gradient descent method for minimizing the Mean Squared Projected Bellman Error (MSPBE), and derive a bound for the Lipschitz constant of the gradient of the MSPBE, which plays a critical role in our proposed accelerated GTD algorithms. Our comprehensive numerical experiments demonstrate promising performance in solving the policy evaluation problem, in comparison to the GTD]algorithm family. In particular, accelerated TDC surpasses state-of-the-art algorithms.

Keywords :

computational complexity; function approximation; gradient methods; learning (artificial intelligence); mean square error methods; stochastic processes; Lipschitz constant; MSPBE; accelerated GTD algorithms; accelerated gradient temporal difference learning algorithms; accelerated stochastic gradient descent algorithms; computational complexity; gradient TD algorithms; linear value function approximation; mean squared projected bellman error; off-policy learning; policy evaluation problem; Acceleration; Approximation algorithms; Convergence; Function approximation; Radio access networks; Vectors;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), 2014 IEEE Symposium on

Conference_Location :

Orlando, FL

Type :

conf

DOI :

10.1109/ADPRL.2014.7010611

Filename :

7010611

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1799303