DocumentCode
3133171
Title
Policy iteration for parameterized Markov decision processes and its application
Author
Li Xia ; Qing-Shan Jia
Author_Institution
Center for Intell. & Networked Syst. (CFINS) Dept. of Autom., Tsinghua Univ., Beijing, China
fYear
2013
fDate
23-26 June 2013
Firstpage
1
Lastpage
6
Abstract
In a parameterized Markov decision process (MDP), the decision maker has to choose the optimal parameters which induce the maximal average system reward. However, the traditional policy iteration algorithm is usually inapplicable because the parameters choosing is not independent of the system state. In this paper, we use the direct comparison approach to study this problem. A general difference equation is derived to compare the performance difference under different parameters. We derive a theoretical condition that can guarantee the application of policy iteration to the parameterized MDP. This policy iteration type algorithm is much more efficient than the gradient optimization algorithm for parameterized MDP. Finally, we study the service rate control problem of closed Jackson networks as an example to demonstrate the main idea of this paper.
Keywords
Markov processes; decision theory; iterative methods; closed Jackson networks; direct comparison approach; general difference equation; maximal average system reward; parameterized MDP; parameterized Markov decision processes; policy iteration type algorithm; service rate control problem; Difference equations; Markov processes; Mathematical model; Optimization; Servers; System performance; Vectors; Markov decision process; direct comparison; parameterized policy; policy iteration; service rate control;
fLanguage
English
Publisher
ieee
Conference_Titel
Control Conference (ASCC), 2013 9th Asian
Conference_Location
Istanbul
Print_ISBN
978-1-4673-5767-8
Type
conf
DOI
10.1109/ASCC.2013.6606023
Filename
6606023
Link To Document