• DocumentCode
    1941066
  • Title

    Optimality of LSTD and its Relation to MC

  • Author

    Grunewalder, S. ; Hochreiter, Sepp ; Obermayer, Klaus

  • Author_Institution
    Univ. of Technol. Berlin, Berlin
  • fYear
    2007
  • fDate
    12-17 Aug. 2007
  • Firstpage
    338
  • Lastpage
    343
  • Abstract
    In this analytical study we compare the risk of the Monte Carlo (MC) and the least-squares TD (LSTD) estimator. We prove that for the case of acyclic Markov Reward Processes (MRPs) LSTD has minimal risk for any convex loss function in the class of unbiased estimators. When comparing the Monte Carlo estimator, which does not assume a Markov structure, and LSTD, we find that the Monte Carlo estimator is equivalent to LSTD if both estimators have the same amount of information. Theoretical results are supported by an empirical evaluation of the estimators.
  • Keywords
    Markov processes; Monte Carlo methods; convex programming; estimation theory; least squares approximations; statistical analysis; Monte Carlo method; acyclic Markov reward process; convex loss function; least-squares temporal difference estimator; statistical estimation theory; Concrete; Convergence; Learning; Materials requirements planning; Monte Carlo methods; Neural networks; Risk analysis; State estimation; Upper bound; Yield estimation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks, 2007. IJCNN 2007. International Joint Conference on
  • Conference_Location
    Orlando, FL
  • ISSN
    1098-7576
  • Print_ISBN
    978-1-4244-1379-9
  • Electronic_ISBN
    1098-7576
  • Type

    conf

  • DOI
    10.1109/IJCNN.2007.4370979
  • Filename
    4370979