DocumentCode :
954877
Title :
Potential-based online policy iteration algorithms for Markov decision processes
Author :
Fang, Hai-Tao ; Cao, Xi-Ren
Author_Institution :
Lab. of Syst. & Control, Acad. of Math. & Syst. Sci., Beijing, China
Volume :
49
Issue :
4
fYear :
2004
fDate :
4/1/2004 12:00:00 AM
Firstpage :
493
Lastpage :
505
Abstract :
Performance potentials play a crucial role in performance sensitivity analysis and policy iteration of Markov decision processes. The potentials can be estimated on a single sample path of a Markov process. In this paper, we propose two potential-based online policy iteration algorithms for performance optimization of Markov systems. The algorithms are based on online estimation of potentials and stochastic approximation. We prove that with these two algorithms the optimal policy can be attained after a finite number of iterations. A simulation example is given to illustrate the main ideas and the convergence rates of the algorithms.
Keywords :
Markov processes; convergence of numerical methods; gradient methods; iterative methods; optimisation; recursive estimation; sensitivity analysis; Markov decision processes; algorithm convergence rates; gradient-based online approach; online policy iteration algorithms; recursive optimization; sensitivity analysis; single-sample-path estimation; stochastic approximation; Approximation algorithms; Communication networks; Convergence; Markov processes; Optimization; Sensitivity analysis; State-space methods; Steady-state; Stochastic processes; Transportation;
fLanguage :
English
Journal_Title :
Automatic Control, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9286
Type :
jour
DOI :
10.1109/TAC.2004.825647
Filename :
1284713
Link To Document :
بازگشت