DocumentCode
423876
Title
Q_learning based on active backup and memory mechanism
Author
Liu, Yang ; Guo, Mao-zu ; Yao, Hong-Xun
Author_Institution
Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol., China
Volume
1
fYear
2004
fDate
26-29 Aug. 2004
Firstpage
271
Abstract
Exploration is used in Q_learning because the agent would be caught in locally optimal policies due to blind exploitation. However excessive exploration would degrade the performance of Q_learning and it is difficult to meet the trade-off between exploration and exploitation. The active backup is introduced into Q_learning and the corresponding algorithm AB_Q_learning based on Dijkstra backup in dynamic programming is proposed. Then, the memory mechanism based MEAB_Q_Iearning algorithm is given for the agent to learn in completely unknown environment. The experimental results show that these two algorithms not only converge more quickly, but also solve the problem of local optimization.
Keywords
back-up procedures; dynamic programming; learning (artificial intelligence); Q_learning; active backup; blind exploitation; dynamic programming; memory mechanism; Computer science; Cybernetics; Degradation; Dynamic programming; Heuristic algorithms; Machine learning; Robots; Unsupervised learning;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference on
Print_ISBN
0-7803-8403-2
Type
conf
DOI
10.1109/ICMLC.2004.1380677
Filename
1380677
Link To Document