DocumentCode
2641763
Title
Infinite-Horizon Policy-Gradient Estimation with Variable Discount Factor for Markov Decision Process
Author
Bao, Bing-Kun ; Yin, Bao-Qun ; Xi, Hong-sheng
Author_Institution
Dept. of Autom., China Univ. of Sci. & Technol., Hefei
fYear
2008
fDate
18-20 June 2008
Firstpage
584
Lastpage
584
Abstract
A novel infinite-horizon policy-gradient estimation method with variable discount factor is proposed in this paper. This method tackles the normal policy-gradient estimation methods´ limitations on unbalance of the bias and variance by using an incremental sequence as the discount factor. Numerical experiments conducted on the Markov decision process have shown its effectiveness.
Keywords
Markov processes; decision theory; gradient methods; infinite horizon; Markov decision process; incremental sequence; infinite-horizon policy-gradient estimation; variable discount factor; Approximation algorithms; Automation; Computational modeling; Eigenvalues and eigenfunctions; Optimization methods; State estimation; State-space methods; Stochastic processes;
fLanguage
English
Publisher
ieee
Conference_Titel
Innovative Computing Information and Control, 2008. ICICIC '08. 3rd International Conference on
Conference_Location
Dalian, Liaoning
Print_ISBN
978-0-7695-3161-8
Electronic_ISBN
978-0-7695-3161-8
Type
conf
DOI
10.1109/ICICIC.2008.318
Filename
4603773
Link To Document