Reinforcement learning to train Ms. Pac-Man using higher-order action-relative inputs

Author

Bom, Luuk ; Henken, Ruud ; Wiering, Marco

Author_Institution

Inst. of Artificial Intell. & Cognitive Eng., Univ. of Groningen, Groningen, Netherlands

fYear

2013

fDate

16-19 April 2013

Firstpage

156

Lastpage

163

Abstract

Reinforcement learning algorithms enable an agent to optimize its behavior from interacting with a specific environment. Although some very successful applications of reinforcement learning algorithms have been developed, it is still an open research question how to scale up to large dynamic environments. In this paper we will study the use of reinforcement learning on the popular arcade video game Ms. Pac-Man. In order to let Ms. Pac-Man quickly learn, we designed particular smart feature extraction algorithms that produce higher-order inputs from the game-state. These inputs are then given to a neural network that is trained using Q-learning. We constructed higher-order features which are relative to the action of Ms. Pac-Man. These relative inputs are then given to a single neural network which sequentially propagates the action-relative inputs to obtain the different Q-values of different actions. The experimental results show that this approach allows the use of only 7 input units in the neural network, while still quickly obtaining very good playing behavior. Furthermore, the experiments show that our approach enables Ms. Pac-Man to successfully transfer its learned policy to a different maze on which it was not trained before.

Keywords

computer games; feature extraction; learning (artificial intelligence); neural nets; Q-learning; arcade video game Ms. Pac-Man; dynamic environments; game state; higher order action relative inputs; neural network; open research question; reinforcement learning algorithms; smart feature extraction algorithms; train Ms. Pac-Man; Biological neural networks; Games; Heuristic algorithms; Learning (artificial intelligence); Neurons; Training;

fLanguage

English

Publisher

ieee

Conference_Titel

Adaptive Dynamic Programming And Reinforcement Learning (ADPRL), 2013 IEEE Symposium on

Conference_Location

Singapore

ISSN

2325-1824

Type

conf

DOI

10.1109/ADPRL.2013.6615002

Filename

6615002