• DocumentCode
    3269713
  • Title

    Reinforcement learning to train Ms. Pac-Man using higher-order action-relative inputs

  • Author

    Bom, Luuk ; Henken, Ruud ; Wiering, Marco

  • Author_Institution
    Inst. of Artificial Intell. & Cognitive Eng., Univ. of Groningen, Groningen, Netherlands
  • fYear
    2013
  • fDate
    16-19 April 2013
  • Firstpage
    156
  • Lastpage
    163
  • Abstract
    Reinforcement learning algorithms enable an agent to optimize its behavior from interacting with a specific environment. Although some very successful applications of reinforcement learning algorithms have been developed, it is still an open research question how to scale up to large dynamic environments. In this paper we will study the use of reinforcement learning on the popular arcade video game Ms. Pac-Man. In order to let Ms. Pac-Man quickly learn, we designed particular smart feature extraction algorithms that produce higher-order inputs from the game-state. These inputs are then given to a neural network that is trained using Q-learning. We constructed higher-order features which are relative to the action of Ms. Pac-Man. These relative inputs are then given to a single neural network which sequentially propagates the action-relative inputs to obtain the different Q-values of different actions. The experimental results show that this approach allows the use of only 7 input units in the neural network, while still quickly obtaining very good playing behavior. Furthermore, the experiments show that our approach enables Ms. Pac-Man to successfully transfer its learned policy to a different maze on which it was not trained before.
  • Keywords
    computer games; feature extraction; learning (artificial intelligence); neural nets; Q-learning; arcade video game Ms. Pac-Man; dynamic environments; game state; higher order action relative inputs; neural network; open research question; reinforcement learning algorithms; smart feature extraction algorithms; train Ms. Pac-Man; Biological neural networks; Games; Heuristic algorithms; Learning (artificial intelligence); Neurons; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Adaptive Dynamic Programming And Reinforcement Learning (ADPRL), 2013 IEEE Symposium on
  • Conference_Location
    Singapore
  • ISSN
    2325-1824
  • Type

    conf

  • DOI
    10.1109/ADPRL.2013.6615002
  • Filename
    6615002