• DocumentCode
    3133171
  • Title

    Policy iteration for parameterized Markov decision processes and its application

  • Author

    Li Xia ; Qing-Shan Jia

  • Author_Institution
    Center for Intell. & Networked Syst. (CFINS) Dept. of Autom., Tsinghua Univ., Beijing, China
  • fYear
    2013
  • fDate
    23-26 June 2013
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    In a parameterized Markov decision process (MDP), the decision maker has to choose the optimal parameters which induce the maximal average system reward. However, the traditional policy iteration algorithm is usually inapplicable because the parameters choosing is not independent of the system state. In this paper, we use the direct comparison approach to study this problem. A general difference equation is derived to compare the performance difference under different parameters. We derive a theoretical condition that can guarantee the application of policy iteration to the parameterized MDP. This policy iteration type algorithm is much more efficient than the gradient optimization algorithm for parameterized MDP. Finally, we study the service rate control problem of closed Jackson networks as an example to demonstrate the main idea of this paper.
  • Keywords
    Markov processes; decision theory; iterative methods; closed Jackson networks; direct comparison approach; general difference equation; maximal average system reward; parameterized MDP; parameterized Markov decision processes; policy iteration type algorithm; service rate control problem; Difference equations; Markov processes; Mathematical model; Optimization; Servers; System performance; Vectors; Markov decision process; direct comparison; parameterized policy; policy iteration; service rate control;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Control Conference (ASCC), 2013 9th Asian
  • Conference_Location
    Istanbul
  • Print_ISBN
    978-1-4673-5767-8
  • Type

    conf

  • DOI
    10.1109/ASCC.2013.6606023
  • Filename
    6606023