DocumentCode
954877
Title
Potential-based online policy iteration algorithms for Markov decision processes
Author
Fang, Hai-Tao ; Cao, Xi-Ren
Author_Institution
Lab. of Syst. & Control, Acad. of Math. & Syst. Sci., Beijing, China
Volume
49
Issue
4
fYear
2004
fDate
4/1/2004 12:00:00 AM
Firstpage
493
Lastpage
505
Abstract
Performance potentials play a crucial role in performance sensitivity analysis and policy iteration of Markov decision processes. The potentials can be estimated on a single sample path of a Markov process. In this paper, we propose two potential-based online policy iteration algorithms for performance optimization of Markov systems. The algorithms are based on online estimation of potentials and stochastic approximation. We prove that with these two algorithms the optimal policy can be attained after a finite number of iterations. A simulation example is given to illustrate the main ideas and the convergence rates of the algorithms.
Keywords
Markov processes; convergence of numerical methods; gradient methods; iterative methods; optimisation; recursive estimation; sensitivity analysis; Markov decision processes; algorithm convergence rates; gradient-based online approach; online policy iteration algorithms; recursive optimization; sensitivity analysis; single-sample-path estimation; stochastic approximation; Approximation algorithms; Communication networks; Convergence; Markov processes; Optimization; Sensitivity analysis; State-space methods; Steady-state; Stochastic processes; Transportation;
fLanguage
English
Journal_Title
Automatic Control, IEEE Transactions on
Publisher
ieee
ISSN
0018-9286
Type
jour
DOI
10.1109/TAC.2004.825647
Filename
1284713
Link To Document