مرکز منطقه ای اطلاع رساني علوم و فناوري - Learning non-random moves for playing Othello: Improving Monte Carlo Tree Search

DocumentCode :

3477252

Title :

Learning non-random moves for playing Othello: Improving Monte Carlo Tree Search

Author :

Robles, David ; Rohlfshagen, Philipp ; Lucas, Simon M.

Author_Institution :

Sch. of Comput. Sci. & Electr. Eng., Univ. of Essex, Colchester, UK

fYear :

2011

fDate :

Aug. 31 2011-Sept. 3 2011

Firstpage :

305

Lastpage :

312

Abstract :

Monte Carlo Tree Search (MCTS) with an appropriate tree policy may be used to approximate a minimax tree for games such as Go, where a state value function cannot be formulated easily: recent MCTS algorithms successfully combine Upper Confidence Bounds for Trees with Monte Carlo (MC) simulations to incrementally refine estimates on the game-theoretic values of the game´s states. Although a game-specific value function is not required for this approach, significant improvements in performance may be achieved by derandomising the MC simulations using domain-specific knowledge. However, recent results suggest that the choice of a non-uniformly random default policy is non-trivial and may often lead to unexpected outcomes. In this paper we employ Temporal Difference Learning (TDL) as a general approach to the integration of domain-specific knowledge in MCTS and subsequently study its impact on the algorithm´s performance. In particular, TDL is used to learn a linear function approximator that is used as an a priori bias to the move selection in the algorithm´s default policy; the function approximator is also used to bias the values of the nodes in the tree directly. The goal of this work is to determine whether such a simplistic approach can be used to improve the performance of MCTS for the well-known board game Othello. The analysis of the results highlights the broader conclusions that may be drawn with respect to non-random default policies in general.

Keywords :

Monte Carlo methods; function approximation; game theory; games of skill; learning (artificial intelligence); minimax techniques; temporal reasoning; trees (mathematics); MC simulations; MCTS algorithms; Monte Carlo simulations; Monte Carlo tree search; TDL; algorithm default policy; board game Othello; domain-specific knowledge; game-specific value function; game-theoretic values; linear function approximator; minimax tree; nonrandom moves; nonuniformly random default policy; state value function; temporal difference learning; tree policy; upper confidence bounds; Approximation algorithms; Approximation methods; Games; Law; Monte Carlo methods; Reliability;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computational Intelligence and Games (CIG), 2011 IEEE Conference on

Conference_Location :

Seoul

Print_ISBN :

978-1-4577-0010-1

Electronic_ISBN :

978-1-4577-0009-5

Type :

conf

DOI :

10.1109/CIG.2011.6032021

Filename :

6032021

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3477252