DocumentCode :
677612
Title :
Relative value iteration for average reward semi-Markov control via simulation
Author :
Gosavi, Abhijit
Author_Institution :
Dept. of Eng. Manage. & Syst. Eng., Missouri Univ. of Sci. & Technol., Rolla, MO, USA
fYear :
2013
fDate :
8-11 Dec. 2013
Firstpage :
623
Lastpage :
630
Abstract :
This paper studies the semi-Markov decision process (SMDP) under the long-run average reward criterion in the simulation-based context. Using dynamic programming, a straightforward approach for solving this problem involves policy iteration; a value iteration approach for this problem involves a transformation that induces an additional computational burden. In the simulation-based context, however, where one seeks to avoid the transition probabilities needed in dynamic programming, value iteration forms a more convenient route for solution purposes. In this paper, hence, we present (to the best of knowledge for the first time) a relative value iteration algorithm for solving average reward SMDPs via simulation. The algorithm is a semi-Markov extension of an algorithm in the literature for the Markov decision process. Our numerical results with the new algorithm are very encouraging.
Keywords :
Markov processes; dynamic programming; iterative methods; probability; simulation; average reward SMDPs; average reward semiMarkov control; dynamic programming; long-run average reward criterion; policy iteration; relative value iteration algorithm; semiMarkov decision process; simulation-based context; transition probabilities; value iteration approach; Algorithm design and analysis; Context; Dynamic programming; Equations; Markov processes; Mathematical model; Modeling;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Simulation Conference (WSC), 2013 Winter
Conference_Location :
Washington, DC
Print_ISBN :
978-1-4799-2077-8
Type :
conf
DOI :
10.1109/WSC.2013.6721456
Filename :
6721456
Link To Document :
بازگشت