• DocumentCode
    2641763
  • Title

    Infinite-Horizon Policy-Gradient Estimation with Variable Discount Factor for Markov Decision Process

  • Author

    Bao, Bing-Kun ; Yin, Bao-Qun ; Xi, Hong-sheng

  • Author_Institution
    Dept. of Autom., China Univ. of Sci. & Technol., Hefei
  • fYear
    2008
  • fDate
    18-20 June 2008
  • Firstpage
    584
  • Lastpage
    584
  • Abstract
    A novel infinite-horizon policy-gradient estimation method with variable discount factor is proposed in this paper. This method tackles the normal policy-gradient estimation methods´ limitations on unbalance of the bias and variance by using an incremental sequence as the discount factor. Numerical experiments conducted on the Markov decision process have shown its effectiveness.
  • Keywords
    Markov processes; decision theory; gradient methods; infinite horizon; Markov decision process; incremental sequence; infinite-horizon policy-gradient estimation; variable discount factor; Approximation algorithms; Automation; Computational modeling; Eigenvalues and eigenfunctions; Optimization methods; State estimation; State-space methods; Stochastic processes;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Innovative Computing Information and Control, 2008. ICICIC '08. 3rd International Conference on
  • Conference_Location
    Dalian, Liaoning
  • Print_ISBN
    978-0-7695-3161-8
  • Electronic_ISBN
    978-0-7695-3161-8
  • Type

    conf

  • DOI
    10.1109/ICICIC.2008.318
  • Filename
    4603773