DocumentCode :
1798060
Title :
Online learning control based on projected gradient temporal difference and advanced heuristic dynamic programming
Author :
Jian Fu ; Sujuan Wei ; Haibo He ; Shengyong Wang
Author_Institution :
Sch. of Autom., Wuhan Univ. of Technol., Wuhan, China
fYear :
2014
fDate :
6-11 July 2014
Firstpage :
3649
Lastpage :
3656
Abstract :
We present a novel online learning control algorithm (OLCPA) which comprises projected gradient temporal difference for action-value function (PGTDAVF) and advanced heuristic dynamic programming with one step delay (AHD-POSD). PGTDAVF can guarantee the convergence of temporal difference(TD)-based policy learning with smooth action-value function approximators, such as neural networks. Meanwhile, AHDPOSD is a specially designed framework for embedding PGTDAVF in to conduct online learning control. It not only coincides with the intention of temporal difference but also enables PGTDAVF to be effective under nonidentical policy environment, which results in more practicality. In this way, the proposed algorithms achieve the stability and practicability simultaneously. Finally, simulation of online learning control on a cart pole benchmark demonstrates practical control capability and efficiency of the presented method.
Keywords :
dynamic programming; function approximation; gradient methods; learning systems; AHD-POSD; OLCPA; PGTDAVF; TD-based policy learning; action-value function; advanced heuristic dynamic programming with one step delay; cart pole benchmark; neural networks; online learning control algorithm; projected gradient temporal difference; smooth action-value function approximators; Approximation algorithms; Delays; Dynamic programming; Heuristic algorithms; Indexes; Mathematical model; Neural networks;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks (IJCNN), 2014 International Joint Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4799-6627-1
Type :
conf
DOI :
10.1109/IJCNN.2014.6889756
Filename :
6889756
Link To Document :
بازگشت