DocumentCode
3563673
Title
Discounted UCB1-tuned for Q-learning
Author
Saito, Koki ; Notsu, Akira ; Honda, Katsuhiro
Author_Institution
Dept. of Comput. Sci. & Intell. Syst., Osaka Prefecture Univ., Sakai, Japan
fYear
2014
Firstpage
966
Lastpage
970
Abstract
Discounted UCB1-tuned was proposed as one of the methods to choose the action in a multi-armed bandit problem. This algorithm is an optimized selection method for balancing between the exploration and the exploitation, by using weighted value and weighted variance. In this paper, we proposed the method to apply Discounted UCB1-tuned to Q-learning, and experimentally evaluated its performance in the continuous state spaces shortest path problem.
Keywords
estimation theory; learning (artificial intelligence); Q-learning; continuous state spaces shortest path problem; discounted UCB1-tuned; multi-armed bandit problem; Computer science; Computers; Damping; Learning (artificial intelligence); Shortest path problem; Standards; Vectors;
fLanguage
English
Publisher
ieee
Conference_Titel
Soft Computing and Intelligent Systems (SCIS), 2014 Joint 7th International Conference on and Advanced Intelligent Systems (ISIS), 15th International Symposium on
Type
conf
DOI
10.1109/SCIS-ISIS.2014.7044672
Filename
7044672
Link To Document