Title :
Two Online Learning Playout Policies in Monte Carlo Go: An Application of Win/Loss States
Author :
Basaldua, Jacques ; Stewart, Steven ; Moreno-Vega, J. Marcos ; Drake, Peter D.
Author_Institution :
Dept. de Estadistica IO y Comput., Univ. de La Laguna, La Laguna, Spain
Abstract :
Recently, Monte Carlo tree search (MCTS) has become the dominant algorithm in Computer Go. This paper compares two simulation algorithms known as playout policies. The base policy includes some mandatory domain-specific knowledge such as seki and urgency patterns, but is still simple to implement. The more advanced learning policy combines two different learning algorithms with those implemented in the base policy. This policy makes use of win/loss states (WLSs) to learn win rates for large sets of features. A very large experimental series of 7960 games includes results for different board sizes, in self-play and against a reference opponent: Fuego. Results are given for equal numbers of simulations and equal central processing unit (CPU) allocation. The improvement is around 100 Elo points, even with equal CPU allocation, and it increases with the number of simulations. Analyzing the proportion of moves generated by each part of the policy and the individual impact of each part provides further insight on how the policy is learning.
Keywords :
Monte Carlo methods; computer games; learning (artificial intelligence); tree searching; CPU; Elo points; FUEGO; MCTS; Monte Carlo Go; Monte Carlo tree search; central processing unit allocation; computer Go; learning algorithms; learning policy; mandatory domain-specific knowledge; online learning playout policies; seki patterns; urgency patterns; win-loss states; Computational modeling; Context; Games; Monte Carlo methods; Resource management; Shape; Tracking; Knowledge discovery; Monte Carlo methods; statistical learning; stochastic systems;
Journal_Title :
Computational Intelligence and AI in Games, IEEE Transactions on
DOI :
10.1109/TCIAIG.2013.2292565