DocumentCode
2160328
Title
Q-learning algorithms for optimal stopping based on least squares
Author
Huizhen Yu ; Bertsekas, Dimitri P.
Author_Institution
Helsinki Inst. for Inf. Technol., Univ. of Helsinki, Helsinki, Finland
fYear
2007
fDate
2-5 July 2007
Firstpage
2368
Lastpage
2375
Abstract
We consider the solution of discounted optimal stopping problems using linear function approximation methods. A Q-learning algorithm for such problems, proposed by Tsitsiklis and Van Roy, is based on the method of temporal differences and stochastic approximation. We propose alternative algorithms, which are based on projected value iteration ideas and least squares. We prove the convergence of some of these algorithms and discuss their properties.
Keywords
Markov processes; approximation theory; decision theory; dynamic programming; iterative methods; learning (artificial intelligence); least squares approximations; pricing; DP; Markovian decision problem; Q-learning algorithm; dynamic programming; financial derivative pricing; least squares; linear function approximation method; optimal stopping problem; stochastic approximation method; temporal difference method; value iteration; Approximation algorithms; Convergence; Equations; Least squares approximations; Q-factor; Vectors;
fLanguage
English
Publisher
ieee
Conference_Titel
Control Conference (ECC), 2007 European
Conference_Location
Kos
Print_ISBN
978-3-9524173-8-6
Type
conf
Filename
7068523
Link To Document