• DocumentCode
    110280
  • Title

    An Equivalence Between Adaptive Dynamic Programming With a Critic and Backpropagation Through Time

  • Author

    Fairbank, Michael ; Alonso, E. ; Prokhorov, Danil

  • Author_Institution
    Dept. of Comput. Sci., City Univ. London, London, UK
  • Volume
    24
  • Issue
    12
  • fYear
    2013
  • fDate
    Dec. 2013
  • Firstpage
    2088
  • Lastpage
    2100
  • Abstract
    We consider the adaptive dynamic programming technique called Dual Heuristic Programming (DHP), which is designed to learn a critic function, when using learned model functions of the environment. DHP is designed for optimizing control problems in large and continuous state spaces. We extend DHP into a new algorithm that we call Value-Gradient Learning, VGL(λ), and prove equivalence of an instance of the new algorithm to Backpropagation Through Time for Control with a greedy policy. Not only does this equivalence provide a link between these two different approaches, but it also enables our variant of DHP to have guaranteed convergence, under certain smoothness conditions and a greedy policy, when using a general smooth nonlinear function approximator for the critic. We consider several experimental scenarios including some that prove divergence of DHP under a greedy policy, which contrasts against our proven-convergent algorithm.
  • Keywords
    backpropagation; dynamic programming; heuristic programming; learning (artificial intelligence); DHP; VGL; adaptive dynamic programming; backpropagation through time; continuous state spaces; control problem optimization; critic function; dual heuristic programming; general smooth nonlinear function approximator; greedy policy; learned model functions; value-gradient learning; Algorithm design and analysis; Approximation algorithms; Convergence; Equations; Neural networks; Trajectory; Vectors; Adaptive dynamic programming (ADP); backpropagation through time; dual heuristic programming (DHP); neural networks; value-gradient learning;
  • fLanguage
    English
  • Journal_Title
    Neural Networks and Learning Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    2162-237X
  • Type

    jour

  • DOI
    10.1109/TNNLS.2013.2271778
  • Filename
    6588970