Convergence of Model-Based Temporal Difference Learning for Control

Author

Van Hasselt, Hado ; Wiering, Marco A.

Author_Institution

Dept. of Inf. & Comput. Sci., Utrecht Univ.

fYear

2007

fDate

1-5 April 2007

Firstpage

Lastpage

Abstract

A theoretical analysis of model-based temporal difference learning for control is given, leading to a proof of convergence. This work differs from earlier work on the convergence of temporal difference learning by proving convergence to the optimal value function. This means that not the values of the current policy are found, but instead the policy is updated in such a manner that ultimately the optimal policy is guaranteed to be reached

Keywords

convergence; learning (artificial intelligence); optimal control; optimal value function; proof of convergence; temporal difference learning; Convergence; Dynamic programming; Intelligent systems; Learning; Stochastic processes; Telephony;

fLanguage

English

Publisher

ieee

Conference_Titel

Approximate Dynamic Programming and Reinforcement Learning, 2007. ADPRL 2007. IEEE International Symposium on

Conference_Location

Honolulu, HI

Print_ISBN

1-4244-0706-0

Type

conf

DOI

10.1109/ADPRL.2007.368170

Filename

4220815

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=2717173