• DocumentCode
    954877
  • Title

    Potential-based online policy iteration algorithms for Markov decision processes

  • Author

    Fang, Hai-Tao ; Cao, Xi-Ren

  • Author_Institution
    Lab. of Syst. & Control, Acad. of Math. & Syst. Sci., Beijing, China
  • Volume
    49
  • Issue
    4
  • fYear
    2004
  • fDate
    4/1/2004 12:00:00 AM
  • Firstpage
    493
  • Lastpage
    505
  • Abstract
    Performance potentials play a crucial role in performance sensitivity analysis and policy iteration of Markov decision processes. The potentials can be estimated on a single sample path of a Markov process. In this paper, we propose two potential-based online policy iteration algorithms for performance optimization of Markov systems. The algorithms are based on online estimation of potentials and stochastic approximation. We prove that with these two algorithms the optimal policy can be attained after a finite number of iterations. A simulation example is given to illustrate the main ideas and the convergence rates of the algorithms.
  • Keywords
    Markov processes; convergence of numerical methods; gradient methods; iterative methods; optimisation; recursive estimation; sensitivity analysis; Markov decision processes; algorithm convergence rates; gradient-based online approach; online policy iteration algorithms; recursive optimization; sensitivity analysis; single-sample-path estimation; stochastic approximation; Approximation algorithms; Communication networks; Convergence; Markov processes; Optimization; Sensitivity analysis; State-space methods; Steady-state; Stochastic processes; Transportation;
  • fLanguage
    English
  • Journal_Title
    Automatic Control, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9286
  • Type

    jour

  • DOI
    10.1109/TAC.2004.825647
  • Filename
    1284713