DocumentCode
2777594
Title
Value-gradient learning
Author
Fairbank, Michael ; Alonso, Eduardo
Author_Institution
Dept. of Comput., City Univ. London, London, UK
fYear
2012
fDate
10-15 June 2012
Firstpage
1
Lastpage
8
Abstract
We describe an Adaptive Dynamic Programming algorithm VGL(λ) for learning a critic function over a large continuous state space. The algorithm, which requires a learned model of the environment, extends Dual Heuristic Dynamic Programming to include a bootstrapping parameter analogous to that used in the reinforcement learning algorithm TD(λ). We provide on-line and batch mode implementations of the algorithm, and summarise the theoretical relationships and motivations of using this method over its precursor algorithms Dual Heuristic Dynamic Programming and TD(λ). Experiments for control problems using a neural network and greedy policy are provided.
Keywords
dynamic programming; learning (artificial intelligence); adaptive dynamic programming; bootstrapping parameter; control problems; critic function; dual heuristic dynamic programming; greedy policy; large continuous state space; neural network; reinforcement learning; value-gradient learning; Approximation algorithms; Dynamic programming; Equations; Heuristic algorithms; Mathematical model; Trajectory; Vectors; Adaptive Dynamic Programming; DHP; Dual Heuristic Dynamic Programming; Value-Gradient Learning;
fLanguage
English
Publisher
ieee
Conference_Titel
Neural Networks (IJCNN), The 2012 International Joint Conference on
Conference_Location
Brisbane, QLD
ISSN
2161-4393
Print_ISBN
978-1-4673-1488-6
Electronic_ISBN
2161-4393
Type
conf
DOI
10.1109/IJCNN.2012.6252791
Filename
6252791
Link To Document