Title :
Comparing different methods to speed up reinforcement learning in a complex domain
Author :
Riedmiller, Martin ; Withopf, Daniel
Author_Institution :
Osnabrueck Univ., Germany
Abstract :
We introduce a new learning algorithm (semi-DP algorithm) designed for MDPs (Markov decision process) where actions either lead to a deterministic successor state or to the terminal state. The algorithm only needs a finite number of loops to converge exactly to the optimal action-value function. We compare this algorithm and three other methods to speed up or simplify the learning process to ordinary Q-learning in a soccer grid-world. Furthermore, we show that different reward functions can considerably change the convergence time of the learning algorithms even if the optimal policy remains unchanged.
Keywords :
Markov processes; decision theory; learning (artificial intelligence); Markov decision process; Q-learning; SMDP homomorphism; learning algorithm; learning process; optimal action-value function; reinforcement learning; semi-DP algorithm; soccer grid-world; Algorithm design and analysis; Convergence; Learning; Robots; MDP; Q-Learning; Reinforcement Learning; SMDP; SMDP homomorphisms; options;
Conference_Titel :
Systems, Man and Cybernetics, 2005 IEEE International Conference on
Print_ISBN :
0-7803-9298-1
DOI :
10.1109/ICSMC.2005.1571636