• DocumentCode
    2673395
  • Title

    An improvement of policy gradient estimation algorithms

  • Author

    Li, Yanjie ; Cao, Fang ; Cao, Xi-Ren

  • Author_Institution
    Dept. of Electron. & Comput. Eng., Hong Kong Univ. of Sci. & Technol., Hong Kong
  • fYear
    2008
  • fDate
    28-30 May 2008
  • Firstpage
    168
  • Lastpage
    172
  • Abstract
    In this paper, we discuss the problem of the sample-path-based (on-line) performance gradient estimation for Markov systems. The existing on-line performance gradient estimation algorithms generally require a standard importance sampling assumption. When the assumption does not hold, these algorithms may lead to poor estimates for the gradients. We show that this assumption can be relaxed. We propose a few algorithms that provide performance gradient estimates for systems that do not satisfy the assumption. Simulation examples are given to illustrate the accuracy of the estimates.
  • Keywords
    Markov processes; discrete event systems; gradient methods; Markov systems; discrete event dynamic system; perturbation analysis; policy gradient estimation algorithms; sample-path-based performance gradient estimation; Degradation; Discrete event systems; Dynamic programming; Helium; Monte Carlo methods; Optimization; Performance analysis; Poisson equations; State-space methods; Steady-state; Markov chain; on-line estimation; performance potentials; perturbation analysis; policy gradient;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Discrete Event Systems, 2008. WODES 2008. 9th International Workshop on
  • Conference_Location
    Goteborg
  • Print_ISBN
    978-1-4244-2592-1
  • Electronic_ISBN
    978-1-4244-2593-8
  • Type

    conf

  • DOI
    10.1109/WODES.2008.4605940
  • Filename
    4605940