DocumentCode
1941066
Title
Optimality of LSTD and its Relation to MC
Author
Grunewalder, S. ; Hochreiter, Sepp ; Obermayer, Klaus
Author_Institution
Univ. of Technol. Berlin, Berlin
fYear
2007
fDate
12-17 Aug. 2007
Firstpage
338
Lastpage
343
Abstract
In this analytical study we compare the risk of the Monte Carlo (MC) and the least-squares TD (LSTD) estimator. We prove that for the case of acyclic Markov Reward Processes (MRPs) LSTD has minimal risk for any convex loss function in the class of unbiased estimators. When comparing the Monte Carlo estimator, which does not assume a Markov structure, and LSTD, we find that the Monte Carlo estimator is equivalent to LSTD if both estimators have the same amount of information. Theoretical results are supported by an empirical evaluation of the estimators.
Keywords
Markov processes; Monte Carlo methods; convex programming; estimation theory; least squares approximations; statistical analysis; Monte Carlo method; acyclic Markov reward process; convex loss function; least-squares temporal difference estimator; statistical estimation theory; Concrete; Convergence; Learning; Materials requirements planning; Monte Carlo methods; Neural networks; Risk analysis; State estimation; Upper bound; Yield estimation;
fLanguage
English
Publisher
ieee
Conference_Titel
Neural Networks, 2007. IJCNN 2007. International Joint Conference on
Conference_Location
Orlando, FL
ISSN
1098-7576
Print_ISBN
978-1-4244-1379-9
Electronic_ISBN
1098-7576
Type
conf
DOI
10.1109/IJCNN.2007.4370979
Filename
4370979
Link To Document