مرکز منطقه ای اطلاع رساني علوم و فناوري - Comparing different methods to speed up reinforcement learning in a complex domain

DocumentCode :

2955882

Title :

Comparing different methods to speed up reinforcement learning in a complex domain

Author :

Riedmiller, Martin ; Withopf, Daniel

Author_Institution :

Osnabrueck Univ., Germany

Volume :

fYear :

2005

fDate :

10-12 Oct. 2005

Firstpage :

3185

Abstract :

We introduce a new learning algorithm (semi-DP algorithm) designed for MDPs (Markov decision process) where actions either lead to a deterministic successor state or to the terminal state. The algorithm only needs a finite number of loops to converge exactly to the optimal action-value function. We compare this algorithm and three other methods to speed up or simplify the learning process to ordinary Q-learning in a soccer grid-world. Furthermore, we show that different reward functions can considerably change the convergence time of the learning algorithms even if the optimal policy remains unchanged.

Keywords :

Markov processes; decision theory; learning (artificial intelligence); Markov decision process; Q-learning; SMDP homomorphism; learning algorithm; learning process; optimal action-value function; reinforcement learning; semi-DP algorithm; soccer grid-world; Algorithm design and analysis; Convergence; Learning; Robots; MDP; Q-Learning; Reinforcement Learning; SMDP; SMDP homomorphisms; options;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Systems, Man and Cybernetics, 2005 IEEE International Conference on

Print_ISBN :

0-7803-9298-1

Type :

conf

DOI :

10.1109/ICSMC.2005.1571636

Filename :

1571636

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2955882