• DocumentCode
    2717173
  • Title

    Convergence of Model-Based Temporal Difference Learning for Control

  • Author

    Van Hasselt, Hado ; Wiering, Marco A.

  • Author_Institution
    Dept. of Inf. & Comput. Sci., Utrecht Univ.
  • fYear
    2007
  • fDate
    1-5 April 2007
  • Firstpage
    60
  • Lastpage
    67
  • Abstract
    A theoretical analysis of model-based temporal difference learning for control is given, leading to a proof of convergence. This work differs from earlier work on the convergence of temporal difference learning by proving convergence to the optimal value function. This means that not the values of the current policy are found, but instead the policy is updated in such a manner that ultimately the optimal policy is guaranteed to be reached
  • Keywords
    convergence; learning (artificial intelligence); optimal control; optimal value function; proof of convergence; temporal difference learning; Convergence; Dynamic programming; Intelligent systems; Learning; Stochastic processes; Telephony;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Approximate Dynamic Programming and Reinforcement Learning, 2007. ADPRL 2007. IEEE International Symposium on
  • Conference_Location
    Honolulu, HI
  • Print_ISBN
    1-4244-0706-0
  • Type

    conf

  • DOI
    10.1109/ADPRL.2007.368170
  • Filename
    4220815