• DocumentCode
    2160328
  • Title

    Q-learning algorithms for optimal stopping based on least squares

  • Author

    Huizhen Yu ; Bertsekas, Dimitri P.

  • Author_Institution
    Helsinki Inst. for Inf. Technol., Univ. of Helsinki, Helsinki, Finland
  • fYear
    2007
  • fDate
    2-5 July 2007
  • Firstpage
    2368
  • Lastpage
    2375
  • Abstract
    We consider the solution of discounted optimal stopping problems using linear function approximation methods. A Q-learning algorithm for such problems, proposed by Tsitsiklis and Van Roy, is based on the method of temporal differences and stochastic approximation. We propose alternative algorithms, which are based on projected value iteration ideas and least squares. We prove the convergence of some of these algorithms and discuss their properties.
  • Keywords
    Markov processes; approximation theory; decision theory; dynamic programming; iterative methods; learning (artificial intelligence); least squares approximations; pricing; DP; Markovian decision problem; Q-learning algorithm; dynamic programming; financial derivative pricing; least squares; linear function approximation method; optimal stopping problem; stochastic approximation method; temporal difference method; value iteration; Approximation algorithms; Convergence; Equations; Least squares approximations; Q-factor; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Control Conference (ECC), 2007 European
  • Conference_Location
    Kos
  • Print_ISBN
    978-3-9524173-8-6
  • Type

    conf

  • Filename
    7068523