• DocumentCode
    3563673
  • Title

    Discounted UCB1-tuned for Q-learning

  • Author

    Saito, Koki ; Notsu, Akira ; Honda, Katsuhiro

  • Author_Institution
    Dept. of Comput. Sci. & Intell. Syst., Osaka Prefecture Univ., Sakai, Japan
  • fYear
    2014
  • Firstpage
    966
  • Lastpage
    970
  • Abstract
    Discounted UCB1-tuned was proposed as one of the methods to choose the action in a multi-armed bandit problem. This algorithm is an optimized selection method for balancing between the exploration and the exploitation, by using weighted value and weighted variance. In this paper, we proposed the method to apply Discounted UCB1-tuned to Q-learning, and experimentally evaluated its performance in the continuous state spaces shortest path problem.
  • Keywords
    estimation theory; learning (artificial intelligence); Q-learning; continuous state spaces shortest path problem; discounted UCB1-tuned; multi-armed bandit problem; Computer science; Computers; Damping; Learning (artificial intelligence); Shortest path problem; Standards; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Soft Computing and Intelligent Systems (SCIS), 2014 Joint 7th International Conference on and Advanced Intelligent Systems (ISIS), 15th International Symposium on
  • Type

    conf

  • DOI
    10.1109/SCIS-ISIS.2014.7044672
  • Filename
    7044672