DocumentCode :
2673395
Title :
An improvement of policy gradient estimation algorithms
Author :
Li, Yanjie ; Cao, Fang ; Cao, Xi-Ren
Author_Institution :
Dept. of Electron. & Comput. Eng., Hong Kong Univ. of Sci. & Technol., Hong Kong
fYear :
2008
fDate :
28-30 May 2008
Firstpage :
168
Lastpage :
172
Abstract :
In this paper, we discuss the problem of the sample-path-based (on-line) performance gradient estimation for Markov systems. The existing on-line performance gradient estimation algorithms generally require a standard importance sampling assumption. When the assumption does not hold, these algorithms may lead to poor estimates for the gradients. We show that this assumption can be relaxed. We propose a few algorithms that provide performance gradient estimates for systems that do not satisfy the assumption. Simulation examples are given to illustrate the accuracy of the estimates.
Keywords :
Markov processes; discrete event systems; gradient methods; Markov systems; discrete event dynamic system; perturbation analysis; policy gradient estimation algorithms; sample-path-based performance gradient estimation; Degradation; Discrete event systems; Dynamic programming; Helium; Monte Carlo methods; Optimization; Performance analysis; Poisson equations; State-space methods; Steady-state; Markov chain; on-line estimation; performance potentials; perturbation analysis; policy gradient;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Discrete Event Systems, 2008. WODES 2008. 9th International Workshop on
Conference_Location :
Goteborg
Print_ISBN :
978-1-4244-2592-1
Electronic_ISBN :
978-1-4244-2593-8
Type :
conf
DOI :
10.1109/WODES.2008.4605940
Filename :
4605940
Link To Document :
بازگشت