DocumentCode :
1696453
Title :
RVI reinforcement learning for semi-Markov decision processes with average reward
Author :
Li, Yanjie ; Cao, Fang
Author_Institution :
Shenzhen Grad. Sch., Div. of Control & Mechatron. Eng., Harbin Inst. of Technol., Harbin, China
fYear :
2010
Firstpage :
1674
Lastpage :
1679
Abstract :
Based on the sensitivity-based approach, we discuss the reinforcement learning problem of semi-Markov decision processes (SMDPs) with average reward. First, we provide a new Bellman optimality equation. On this basis, we propose a relative value iteration (RVI) reinforcement learning algorithm. The new RVI reinforcement learning algorithm may avoid the estimation of optimal average reward in the process of learning and has a good convergence rate.
Keywords :
Markov processes; iterative methods; learning (artificial intelligence); optimisation; Bellman optimality equation; RVI reinforcement learning algorithm; SMDP; convergence rate; optimal average reward; reinforcement learning problem; relative value iteration reinforcement learning algorithm; semiMarkov decision processes; sensitivity-based approach; Algorithm design and analysis; Convergence; Equations; Estimation; Heuristic algorithms; Learning; Markov processes; Performance potential; Reinforcement learning; Relative value iteration; Semi-Markov decision processes;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Control and Automation (WCICA), 2010 8th World Congress on
Conference_Location :
Jinan
Print_ISBN :
978-1-4244-6712-9
Type :
conf
DOI :
10.1109/WCICA.2010.5554785
Filename :
5554785
Link To Document :
بازگشت