DocumentCode
1696453
Title
RVI reinforcement learning for semi-Markov decision processes with average reward
Author
Li, Yanjie ; Cao, Fang
Author_Institution
Shenzhen Grad. Sch., Div. of Control & Mechatron. Eng., Harbin Inst. of Technol., Harbin, China
fYear
2010
Firstpage
1674
Lastpage
1679
Abstract
Based on the sensitivity-based approach, we discuss the reinforcement learning problem of semi-Markov decision processes (SMDPs) with average reward. First, we provide a new Bellman optimality equation. On this basis, we propose a relative value iteration (RVI) reinforcement learning algorithm. The new RVI reinforcement learning algorithm may avoid the estimation of optimal average reward in the process of learning and has a good convergence rate.
Keywords
Markov processes; iterative methods; learning (artificial intelligence); optimisation; Bellman optimality equation; RVI reinforcement learning algorithm; SMDP; convergence rate; optimal average reward; reinforcement learning problem; relative value iteration reinforcement learning algorithm; semiMarkov decision processes; sensitivity-based approach; Algorithm design and analysis; Convergence; Equations; Estimation; Heuristic algorithms; Learning; Markov processes; Performance potential; Reinforcement learning; Relative value iteration; Semi-Markov decision processes;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Control and Automation (WCICA), 2010 8th World Congress on
Conference_Location
Jinan
Print_ISBN
978-1-4244-6712-9
Type
conf
DOI
10.1109/WCICA.2010.5554785
Filename
5554785
Link To Document