DocumentCode
2717173
Title
Convergence of Model-Based Temporal Difference Learning for Control
Author
Van Hasselt, Hado ; Wiering, Marco A.
Author_Institution
Dept. of Inf. & Comput. Sci., Utrecht Univ.
fYear
2007
fDate
1-5 April 2007
Firstpage
60
Lastpage
67
Abstract
A theoretical analysis of model-based temporal difference learning for control is given, leading to a proof of convergence. This work differs from earlier work on the convergence of temporal difference learning by proving convergence to the optimal value function. This means that not the values of the current policy are found, but instead the policy is updated in such a manner that ultimately the optimal policy is guaranteed to be reached
Keywords
convergence; learning (artificial intelligence); optimal control; optimal value function; proof of convergence; temporal difference learning; Convergence; Dynamic programming; Intelligent systems; Learning; Stochastic processes; Telephony;
fLanguage
English
Publisher
ieee
Conference_Titel
Approximate Dynamic Programming and Reinforcement Learning, 2007. ADPRL 2007. IEEE International Symposium on
Conference_Location
Honolulu, HI
Print_ISBN
1-4244-0706-0
Type
conf
DOI
10.1109/ADPRL.2007.368170
Filename
4220815
Link To Document