• DocumentCode
    423876
  • Title

    Q_learning based on active backup and memory mechanism

  • Author

    Liu, Yang ; Guo, Mao-zu ; Yao, Hong-Xun

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol., China
  • Volume
    1
  • fYear
    2004
  • fDate
    26-29 Aug. 2004
  • Firstpage
    271
  • Abstract
    Exploration is used in Q_learning because the agent would be caught in locally optimal policies due to blind exploitation. However excessive exploration would degrade the performance of Q_learning and it is difficult to meet the trade-off between exploration and exploitation. The active backup is introduced into Q_learning and the corresponding algorithm AB_Q_learning based on Dijkstra backup in dynamic programming is proposed. Then, the memory mechanism based MEAB_Q_Iearning algorithm is given for the agent to learn in completely unknown environment. The experimental results show that these two algorithms not only converge more quickly, but also solve the problem of local optimization.
  • Keywords
    back-up procedures; dynamic programming; learning (artificial intelligence); Q_learning; active backup; blind exploitation; dynamic programming; memory mechanism; Computer science; Cybernetics; Degradation; Dynamic programming; Heuristic algorithms; Machine learning; Robots; Unsupervised learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference on
  • Print_ISBN
    0-7803-8403-2
  • Type

    conf

  • DOI
    10.1109/ICMLC.2004.1380677
  • Filename
    1380677