Title :
Enhancing upper confidence bounds for trees with temporal difference values
Author :
Vodopivec, Tom ; Ster, Branko
Author_Institution :
Fac. of Comput. & Inf. Sci., Univ. of Ljubljana, Ljubljana, Slovenia
Abstract :
Upper confidence bounds for trees (UCT) is one of the most popular and generally effective Monte Carlo tree search (MCTS) algorithms. However, in practice it is relatively weak when not aided by additional enhancements. Improving its performance without reducing generality is a current research challenge. We introduce a new domain-independent UCT enhancement based on the theory of reinforcement learning. Our approach estimates state values in the UCT tree by employing temporal difference (TD) learning, which is known to outperform plain Monte Carlo sampling in certain domains. We present three adaptations of the TD(λ) algorithm to the UCT´s tree policy and backpropagation step. Evaluations on four games (Gomoku, Hex, Connect Four, and Tic Tac Toe) reveal that our approach increases UCT´s level of play comparably to the rapid action value estimation (RAVE) enhancement. Furthermore, it proves highly compatible with a modified all moves as first heuristic, where it considerably outperforms RAVE. The findings suggest that integration of TD learning into MCTS deserves further research, which may form a new class of MCTS enhancements.
Keywords :
Monte Carlo methods; backpropagation; game theory; tree searching; trees (mathematics); MCTS algorithms; Monte Carlo sampling; Monte Carlo tree search algorithms; RAVE enhancement; UCT tree policy; backpropagation; connect four game; domain-independent UCT enhancement; gomoku game; hex game; rapid action value estimation enhancement; reinforcement learning; state value estimation; temporal difference learning; temporal difference values; tic tac toe game; upper confidence bounds for trees; Benchmark testing; Complexity theory; Games; Optimization; Radiation detectors; Scalability; Sensitivity;
Conference_Titel :
Computational Intelligence and Games (CIG), 2014 IEEE Conference on
Conference_Location :
Dortmund
DOI :
10.1109/CIG.2014.6932895