• DocumentCode
    1946553
  • Title

    Improved Simultaneous Perturbation Stochastic Approximation and Its Application in Reinforcement Learning

  • Author

    Yue, Xiumei

  • Author_Institution
    Dept. of Electr. & Electron. Eng., Huangshi Inst. of Technol., Huangshi
  • Volume
    1
  • fYear
    2008
  • fDate
    12-14 Dec. 2008
  • Firstpage
    329
  • Lastpage
    332
  • Abstract
    In the optimization problem which only measurements of the objective function are available, it is difficult or impossible to directly obtain the gradient of the objective function. Although the second order simultaneous perturbation stochastic approximation (2SPSA) algorithm solves this problem successfully by efficient gradient approximation that relies on measurements of the objective function, the accuracy of the algorithm depends on the matrix conditioning of the objective function Hessian. In order to eliminate the influence caused by the objective function Hessian, this paper uses nonlinear conjugate gradient method to decide the search direction of the objective function. By synthesizing different nonlinear conjugate gradient methods, it ensures each search direction to be descensive. Besides the search direction improvement, this paper also uses inexact line searches to decide the stepsize of movement. With the descensive search direction and appropriate stepsize, the improved SPSA converges faster than the 2SPSA. Through applying to reinforcement learning, the virtues of the improved SPSA are validated.
  • Keywords
    Hessian matrices; approximation theory; conjugate gradient methods; learning (artificial intelligence); stochastic processes; gradient approximation; matrix conditioning; nonlinear conjugate gradient method; objective function Hessian; reinforcement learning; simultaneous perturbation stochastic approximation algorithm; Acceleration; Application software; Approximation algorithms; Computer science; Convergence; Finite difference methods; Gradient methods; Learning; Software engineering; Stochastic processes; SPSA; nonlinear conjugate gradient method; reinforcement learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Software Engineering, 2008 International Conference on
  • Conference_Location
    Wuhan, Hubei
  • Print_ISBN
    978-0-7695-3336-0
  • Type

    conf

  • DOI
    10.1109/CSSE.2008.1019
  • Filename
    4721754