DocumentCode :
2703131
Title :
Convergence behavior of temporal difference learning
Author :
Malhotra, Raj P.
Author_Institution :
Dept. of Electr. Eng., Dayton Univ., OH, USA
Volume :
2
fYear :
1996
fDate :
20-23 May 1996
Firstpage :
887
Abstract :
Temporal Difference Learning is an important class of incremental learning procedures which learn to predict outcomes of sequential processes through experience. Although these algorithms have been used in a variety of notorious intelligent systems such as Samuel´s checker-player and Tesauro´s Backgammon program, their convergence properties remain poorly understood. This paper provides a brief summary of the theoretical basis for these algorithms and documents observed convergence performance in a variety of experiments. The implications of these results are also briefly discussed
Keywords :
convergence; learning (artificial intelligence); Backgammon program; Samuel; Temporal Difference Learning; Tesauro; checker-player; convergence performance; incremental learning; intelligent systems; sequential processes; Control systems; Convergence; Delay; Feedback; Intelligent systems; Iterative algorithms; Iterative methods; Learning; Pattern recognition; Yarn;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Aerospace and Electronics Conference, 1996. NAECON 1996., Proceedings of the IEEE 1996 National
Conference_Location :
Dayton, OH
ISSN :
0547-3578
Print_ISBN :
0-7803-3306-3
Type :
conf
DOI :
10.1109/NAECON.1996.517756
Filename :
517756
Link To Document :
بازگشت