• DocumentCode
    2777594
  • Title

    Value-gradient learning

  • Author

    Fairbank, Michael ; Alonso, Eduardo

  • Author_Institution
    Dept. of Comput., City Univ. London, London, UK
  • fYear
    2012
  • fDate
    10-15 June 2012
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    We describe an Adaptive Dynamic Programming algorithm VGL(λ) for learning a critic function over a large continuous state space. The algorithm, which requires a learned model of the environment, extends Dual Heuristic Dynamic Programming to include a bootstrapping parameter analogous to that used in the reinforcement learning algorithm TD(λ). We provide on-line and batch mode implementations of the algorithm, and summarise the theoretical relationships and motivations of using this method over its precursor algorithms Dual Heuristic Dynamic Programming and TD(λ). Experiments for control problems using a neural network and greedy policy are provided.
  • Keywords
    dynamic programming; learning (artificial intelligence); adaptive dynamic programming; bootstrapping parameter; control problems; critic function; dual heuristic dynamic programming; greedy policy; large continuous state space; neural network; reinforcement learning; value-gradient learning; Approximation algorithms; Dynamic programming; Equations; Heuristic algorithms; Mathematical model; Trajectory; Vectors; Adaptive Dynamic Programming; DHP; Dual Heuristic Dynamic Programming; Value-Gradient Learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks (IJCNN), The 2012 International Joint Conference on
  • Conference_Location
    Brisbane, QLD
  • ISSN
    2161-4393
  • Print_ISBN
    978-1-4673-1488-6
  • Electronic_ISBN
    2161-4393
  • Type

    conf

  • DOI
    10.1109/IJCNN.2012.6252791
  • Filename
    6252791