Q-learning algorithms for optimal stopping based on least squares

Author

Huizhen Yu ; Bertsekas, Dimitri P.

Author_Institution

Helsinki Inst. for Inf. Technol., Univ. of Helsinki, Helsinki, Finland

fYear

2007

fDate

2-5 July 2007

Firstpage

2368

Lastpage

2375

Abstract

We consider the solution of discounted optimal stopping problems using linear function approximation methods. A Q-learning algorithm for such problems, proposed by Tsitsiklis and Van Roy, is based on the method of temporal differences and stochastic approximation. We propose alternative algorithms, which are based on projected value iteration ideas and least squares. We prove the convergence of some of these algorithms and discuss their properties.

Keywords

Markov processes; approximation theory; decision theory; dynamic programming; iterative methods; learning (artificial intelligence); least squares approximations; pricing; DP; Markovian decision problem; Q-learning algorithm; dynamic programming; financial derivative pricing; least squares; linear function approximation method; optimal stopping problem; stochastic approximation method; temporal difference method; value iteration; Approximation algorithms; Convergence; Equations; Least squares approximations; Q-factor; Vectors;

fLanguage

English

Publisher

ieee

Conference_Titel

Control Conference (ECC), 2007 European

Conference_Location

Kos

Print_ISBN

978-3-9524173-8-6

Type

conf

Filename

7068523

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=2160328