DocumentCode
260235
Title
Improving the performance of Q-learning using simultanouse Q-values updating
Author
Pouyan, Maryam ; Mousavi, Amin ; Golzari, Shahram ; Hatam, Ahmad
Author_Institution
Electr. & Comput. Eng. Dept., Hormozgan Univ., Bandarabbas, Iran
fYear
2014
fDate
26-27 Nov. 2014
Firstpage
1
Lastpage
6
Abstract
Q-learning is a one of the best model-free reinforcement learning algorithms. The goal is to find an estimate of the optimal action-value function called Q-value function. The Q-value function is defined as the expected sum of future rewards obtained by taking an action in the current state. The main drawback of Q-learning is that the learning process is expensive for the agent, specially, in the beginning steps. Because, every state-action pair should be visited frequently in order to converge to the optimal policy. In this paper, the concept of opposite action is used to improve the performance of the Q-learning algorithm, especially, in the beginning steps of the learning. Opposite actions suggest updating two Q-values, simultaneously. The agent will update Q-value for each action and corresponding opposite action and thus increasing the speed of learning. The novel Q-learning method based on the concept of opposite action is simulated for the famous test-bed grid world problem. The results show the ability of the proposed method to improve the learning process.
Keywords
learning (artificial intelligence); optimisation; Q-learning; Q-value function; optimal action-value function; reinforcement learning algorithm; Computational intelligence; Computers; Convergence; Educational institutions; Knowledge engineering; Learning (artificial intelligence); Standards; Q-leaming; estimate value; opposite action; reinforcement learning;
fLanguage
English
Publisher
ieee
Conference_Titel
Technology, Communication and Knowledge (ICTCK), 2014 International Congress on
Conference_Location
Mashhad
Type
conf
DOI
10.1109/ICTCK.2014.7033528
Filename
7033528
Link To Document