DocumentCode
110280
Title
An Equivalence Between Adaptive Dynamic Programming With a Critic and Backpropagation Through Time
Author
Fairbank, Michael ; Alonso, E. ; Prokhorov, Danil
Author_Institution
Dept. of Comput. Sci., City Univ. London, London, UK
Volume
24
Issue
12
fYear
2013
fDate
Dec. 2013
Firstpage
2088
Lastpage
2100
Abstract
We consider the adaptive dynamic programming technique called Dual Heuristic Programming (DHP), which is designed to learn a critic function, when using learned model functions of the environment. DHP is designed for optimizing control problems in large and continuous state spaces. We extend DHP into a new algorithm that we call Value-Gradient Learning, VGL(λ), and prove equivalence of an instance of the new algorithm to Backpropagation Through Time for Control with a greedy policy. Not only does this equivalence provide a link between these two different approaches, but it also enables our variant of DHP to have guaranteed convergence, under certain smoothness conditions and a greedy policy, when using a general smooth nonlinear function approximator for the critic. We consider several experimental scenarios including some that prove divergence of DHP under a greedy policy, which contrasts against our proven-convergent algorithm.
Keywords
backpropagation; dynamic programming; heuristic programming; learning (artificial intelligence); DHP; VGL; adaptive dynamic programming; backpropagation through time; continuous state spaces; control problem optimization; critic function; dual heuristic programming; general smooth nonlinear function approximator; greedy policy; learned model functions; value-gradient learning; Algorithm design and analysis; Approximation algorithms; Convergence; Equations; Neural networks; Trajectory; Vectors; Adaptive dynamic programming (ADP); backpropagation through time; dual heuristic programming (DHP); neural networks; value-gradient learning;
fLanguage
English
Journal_Title
Neural Networks and Learning Systems, IEEE Transactions on
Publisher
ieee
ISSN
2162-237X
Type
jour
DOI
10.1109/TNNLS.2013.2271778
Filename
6588970
Link To Document