An Equivalence Between Adaptive Dynamic Programming With a Critic and Backpropagation Through Time

Author

Fairbank, Michael ; Alonso, E. ; Prokhorov, Danil

Author_Institution

Dept. of Comput. Sci., City Univ. London, London, UK

Volume

24

Issue

12

fYear

2013

fDate

Dec. 2013

Firstpage

2088

Lastpage

2100

Abstract

We consider the adaptive dynamic programming technique called Dual Heuristic Programming (DHP), which is designed to learn a critic function, when using learned model functions of the environment. DHP is designed for optimizing control problems in large and continuous state spaces. We extend DHP into a new algorithm that we call Value-Gradient Learning, VGL(λ), and prove equivalence of an instance of the new algorithm to Backpropagation Through Time for Control with a greedy policy. Not only does this equivalence provide a link between these two different approaches, but it also enables our variant of DHP to have guaranteed convergence, under certain smoothness conditions and a greedy policy, when using a general smooth nonlinear function approximator for the critic. We consider several experimental scenarios including some that prove divergence of DHP under a greedy policy, which contrasts against our proven-convergent algorithm.

Keywords

backpropagation; dynamic programming; heuristic programming; learning (artificial intelligence); DHP; VGL; adaptive dynamic programming; backpropagation through time; continuous state spaces; control problem optimization; critic function; dual heuristic programming; general smooth nonlinear function approximator; greedy policy; learned model functions; value-gradient learning; Algorithm design and analysis; Approximation algorithms; Convergence; Equations; Neural networks; Trajectory; Vectors; Adaptive dynamic programming (ADP); backpropagation through time; dual heuristic programming (DHP); neural networks; value-gradient learning;

fLanguage

English

Journal_Title

Neural Networks and Learning Systems, IEEE Transactions on

Publisher

ieee

ISSN

2162-237X

Type

jour

DOI

10.1109/TNNLS.2013.2271778

Filename

6588970