Title :
Reinforcement learning with state-dependent discount factor
Author :
Yoshida, Norihiro ; Uchibe, Eiji ; Doya, Kenji
Author_Institution :
Nara Inst. of Sci. & Technol. (NAIST), Nara, Japan
Abstract :
Conventional reinforcement learning algorithms have several parameters which determine the feature of learning process, called meta-parameters. In this study, we focus on the discount factor that influences the time scale of the tradeoff between immediate and delayed rewards. The discount factor is usually considered as a constant value, but we introduce the state-dependent discount function and a new optimization criterion for the reinforcement learning algorithm. We first derive a new algorithm under the criterion, named ExQ-learning and we prove that the algorithm converges to the optimal action-value function in the meaning of new criterion w.p.1. We then present a framework to optimize the discount factor and the discount function by using an evolutionary algorithm. In order to validate the proposed method, we conduct a simple computer simulation and show that the proposed algorithm can find an appropriate state-dependent discount function with which performs better than that with a constant discount factor.
Keywords :
learning (artificial intelligence); optimisation; ExQ-learning; computer simulation; constant discount factor; evolutionary algorithm; meta-parameters; optimal action-value function; optimization criterion; reinforcement learning algorithms; state-dependent discount factor; state-dependent discount function; Convergence; Equations; Green products; Learning (artificial intelligence); Linear programming; Mathematical model; Robots;
Conference_Titel :
Development and Learning and Epigenetic Robotics (ICDL), 2013 IEEE Third Joint International Conference on
Conference_Location :
Osaka
DOI :
10.1109/DevLrn.2013.6652533