DocumentCode
2673395
Title
An improvement of policy gradient estimation algorithms
Author
Li, Yanjie ; Cao, Fang ; Cao, Xi-Ren
Author_Institution
Dept. of Electron. & Comput. Eng., Hong Kong Univ. of Sci. & Technol., Hong Kong
fYear
2008
fDate
28-30 May 2008
Firstpage
168
Lastpage
172
Abstract
In this paper, we discuss the problem of the sample-path-based (on-line) performance gradient estimation for Markov systems. The existing on-line performance gradient estimation algorithms generally require a standard importance sampling assumption. When the assumption does not hold, these algorithms may lead to poor estimates for the gradients. We show that this assumption can be relaxed. We propose a few algorithms that provide performance gradient estimates for systems that do not satisfy the assumption. Simulation examples are given to illustrate the accuracy of the estimates.
Keywords
Markov processes; discrete event systems; gradient methods; Markov systems; discrete event dynamic system; perturbation analysis; policy gradient estimation algorithms; sample-path-based performance gradient estimation; Degradation; Discrete event systems; Dynamic programming; Helium; Monte Carlo methods; Optimization; Performance analysis; Poisson equations; State-space methods; Steady-state; Markov chain; on-line estimation; performance potentials; perturbation analysis; policy gradient;
fLanguage
English
Publisher
ieee
Conference_Titel
Discrete Event Systems, 2008. WODES 2008. 9th International Workshop on
Conference_Location
Goteborg
Print_ISBN
978-1-4244-2592-1
Electronic_ISBN
978-1-4244-2593-8
Type
conf
DOI
10.1109/WODES.2008.4605940
Filename
4605940
Link To Document