RVI reinforcement learning for semi-Markov decision processes with average reward

Author

Li, Yanjie ; Cao, Fang

Author_Institution

Shenzhen Grad. Sch., Div. of Control & Mechatron. Eng., Harbin Inst. of Technol., Harbin, China

fYear

2010

Firstpage

1674

Lastpage

1679

Abstract

Based on the sensitivity-based approach, we discuss the reinforcement learning problem of semi-Markov decision processes (SMDPs) with average reward. First, we provide a new Bellman optimality equation. On this basis, we propose a relative value iteration (RVI) reinforcement learning algorithm. The new RVI reinforcement learning algorithm may avoid the estimation of optimal average reward in the process of learning and has a good convergence rate.

Keywords

Markov processes; iterative methods; learning (artificial intelligence); optimisation; Bellman optimality equation; RVI reinforcement learning algorithm; SMDP; convergence rate; optimal average reward; reinforcement learning problem; relative value iteration reinforcement learning algorithm; semiMarkov decision processes; sensitivity-based approach; Algorithm design and analysis; Convergence; Equations; Estimation; Heuristic algorithms; Learning; Markov processes; Performance potential; Reinforcement learning; Relative value iteration; Semi-Markov decision processes;

fLanguage

English

Publisher

ieee

Conference_Titel

Intelligent Control and Automation (WCICA), 2010 8th World Congress on

Conference_Location

Jinan

Print_ISBN

978-1-4244-6712-9

Type

conf

DOI

10.1109/WCICA.2010.5554785

Filename

5554785