Optimality of LSTD and its Relation to MC

Author

Grunewalder, S. ; Hochreiter, Sepp ; Obermayer, Klaus

Author_Institution

Univ. of Technol. Berlin, Berlin

fYear

2007

fDate

12-17 Aug. 2007

Firstpage

338

Lastpage

343

Abstract

In this analytical study we compare the risk of the Monte Carlo (MC) and the least-squares TD (LSTD) estimator. We prove that for the case of acyclic Markov Reward Processes (MRPs) LSTD has minimal risk for any convex loss function in the class of unbiased estimators. When comparing the Monte Carlo estimator, which does not assume a Markov structure, and LSTD, we find that the Monte Carlo estimator is equivalent to LSTD if both estimators have the same amount of information. Theoretical results are supported by an empirical evaluation of the estimators.

Keywords

Markov processes; Monte Carlo methods; convex programming; estimation theory; least squares approximations; statistical analysis; Monte Carlo method; acyclic Markov reward process; convex loss function; least-squares temporal difference estimator; statistical estimation theory; Concrete; Convergence; Learning; Materials requirements planning; Monte Carlo methods; Neural networks; Risk analysis; State estimation; Upper bound; Yield estimation;

fLanguage

English

Publisher

ieee

Conference_Titel

Neural Networks, 2007. IJCNN 2007. International Joint Conference on

Conference_Location

Orlando, FL

ISSN

1098-7576

Print_ISBN

978-1-4244-1379-9

Electronic_ISBN

1098-7576

Type

conf

DOI

10.1109/IJCNN.2007.4370979

Filename

4370979

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=1941066