DocumentCode
3433937
Title
TD-learning with exploration
Author
Meyn, Sean P. ; Surana, Amit
Author_Institution
Department of Electrical and Computer Engineering and the Coordinated Science Laboratory at UIUC, USA
fYear
2011
fDate
12-15 Dec. 2011
Firstpage
148
Lastpage
155
Abstract
We introduce exploration in the TD-learning algorithm to approximate the value function for a given policy. In this way we can modify the norm used for approximation, “zooming in” to a region of interest in the state space. We also provide extensions to SARSA to eliminate the need for numerical integration in policy improvement. Construction of the algorithm and its analysis build on recent general results concerning the spectral theory of Markov chains and positive operators.
Keywords
Approximation algorithms; Equations; Function approximation; Linear approximation; Markov processes; Mathematical model;
fLanguage
English
Publisher
ieee
Conference_Titel
Decision and Control and European Control Conference (CDC-ECC), 2011 50th IEEE Conference on
Conference_Location
Orlando, FL, USA
ISSN
0743-1546
Print_ISBN
978-1-61284-800-6
Electronic_ISBN
0743-1546
Type
conf
DOI
10.1109/CDC.2011.6160851
Filename
6160851
Link To Document