Improving the performance of Q-learning using simultanouse Q-values updating

Author

Pouyan, Maryam ; Mousavi, Amin ; Golzari, Shahram ; Hatam, Ahmad

Author_Institution

Electr. & Comput. Eng. Dept., Hormozgan Univ., Bandarabbas, Iran

fYear

2014

fDate

26-27 Nov. 2014

Firstpage

1

Lastpage

6

Abstract

Q-learning is a one of the best model-free reinforcement learning algorithms. The goal is to find an estimate of the optimal action-value function called Q-value function. The Q-value function is defined as the expected sum of future rewards obtained by taking an action in the current state. The main drawback of Q-learning is that the learning process is expensive for the agent, specially, in the beginning steps. Because, every state-action pair should be visited frequently in order to converge to the optimal policy. In this paper, the concept of opposite action is used to improve the performance of the Q-learning algorithm, especially, in the beginning steps of the learning. Opposite actions suggest updating two Q-values, simultaneously. The agent will update Q-value for each action and corresponding opposite action and thus increasing the speed of learning. The novel Q-learning method based on the concept of opposite action is simulated for the famous test-bed grid world problem. The results show the ability of the proposed method to improve the learning process.

Keywords

learning (artificial intelligence); optimisation; Q-learning; Q-value function; optimal action-value function; reinforcement learning algorithm; Computational intelligence; Computers; Convergence; Educational institutions; Knowledge engineering; Learning (artificial intelligence); Standards; Q-leaming; estimate value; opposite action; reinforcement learning;

fLanguage

English

Publisher

ieee

Conference_Titel

Technology, Communication and Knowledge (ICTCK), 2014 International Congress on

Conference_Location

Mashhad

Type

conf

DOI

10.1109/ICTCK.2014.7033528

Filename

7033528