Title :
Q_learning based on active backup and memory mechanism
Author :
Liu, Yang ; Guo, Mao-zu ; Yao, Hong-Xun
Author_Institution :
Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol., China
Abstract :
Exploration is used in Q_learning because the agent would be caught in locally optimal policies due to blind exploitation. However excessive exploration would degrade the performance of Q_learning and it is difficult to meet the trade-off between exploration and exploitation. The active backup is introduced into Q_learning and the corresponding algorithm AB_Q_learning based on Dijkstra backup in dynamic programming is proposed. Then, the memory mechanism based MEAB_Q_Iearning algorithm is given for the agent to learn in completely unknown environment. The experimental results show that these two algorithms not only converge more quickly, but also solve the problem of local optimization.
Keywords :
back-up procedures; dynamic programming; learning (artificial intelligence); Q_learning; active backup; blind exploitation; dynamic programming; memory mechanism; Computer science; Cybernetics; Degradation; Dynamic programming; Heuristic algorithms; Machine learning; Robots; Unsupervised learning;
Conference_Titel :
Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference on
Print_ISBN :
0-7803-8403-2
DOI :
10.1109/ICMLC.2004.1380677