Monte-Carlo Go Reinforcement Learning Experiments

Author

Bouzy, Bruno ; Chaslot, Guillaume

Author_Institution

UFR de mathematiques et d´´informatique, Univ. Rene Descartes, Paris

fYear

2006

fDate

22-24 May 2006

Firstpage

187

Lastpage

194

Abstract

This paper describes experiments using reinforcement learning techniques to compute pattern urgencies used during simulations performed in a Monte-Carlo Go architecture. Currently, Monte-Carlo is a popular technique for computer Go. In a previous study, Monte-Carlo was associated with domain-dependent knowledge in the Go-playing program Indigo. In 2003, a 3times3 pattern database was built manually. This paper explores the possibility of using reinforcement learning to automatically tune the 3times3 pattern urgencies. On 9times9 boards, within the Monte-Carlo architecture of Indigo, the result obtained by our automatic learning experiments is better than the manual method by a 3-point margin on average, which is satisfactory. Although the current results are promising on 19times19 boards, obtaining strictly positive results with such a large size remains to be done

Keywords

computer games; learning (artificial intelligence); Go-playing program; Indigo; Monte-Carlo Go architecture; pattern database; pattern urgencies computing; reinforcement learning; Computational modeling; Computer architecture; Computer science; Databases; Distributed computing; Humans; Learning; Performance evaluation; Vocabulary; Computer Go; Monte-Carlo; Reinforcement Learning;

fLanguage

English

Publisher

ieee

Conference_Titel

Computational Intelligence and Games, 2006 IEEE Symposium on

Conference_Location

Reno, NV

Print_ISBN

1-4244-0464-9

Type

conf

DOI

10.1109/CIG.2006.311699

Filename

4100126