• DocumentCode
    1696453
  • Title

    RVI reinforcement learning for semi-Markov decision processes with average reward

  • Author

    Li, Yanjie ; Cao, Fang

  • Author_Institution
    Shenzhen Grad. Sch., Div. of Control & Mechatron. Eng., Harbin Inst. of Technol., Harbin, China
  • fYear
    2010
  • Firstpage
    1674
  • Lastpage
    1679
  • Abstract
    Based on the sensitivity-based approach, we discuss the reinforcement learning problem of semi-Markov decision processes (SMDPs) with average reward. First, we provide a new Bellman optimality equation. On this basis, we propose a relative value iteration (RVI) reinforcement learning algorithm. The new RVI reinforcement learning algorithm may avoid the estimation of optimal average reward in the process of learning and has a good convergence rate.
  • Keywords
    Markov processes; iterative methods; learning (artificial intelligence); optimisation; Bellman optimality equation; RVI reinforcement learning algorithm; SMDP; convergence rate; optimal average reward; reinforcement learning problem; relative value iteration reinforcement learning algorithm; semiMarkov decision processes; sensitivity-based approach; Algorithm design and analysis; Convergence; Equations; Estimation; Heuristic algorithms; Learning; Markov processes; Performance potential; Reinforcement learning; Relative value iteration; Semi-Markov decision processes;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Control and Automation (WCICA), 2010 8th World Congress on
  • Conference_Location
    Jinan
  • Print_ISBN
    978-1-4244-6712-9
  • Type

    conf

  • DOI
    10.1109/WCICA.2010.5554785
  • Filename
    5554785