• DocumentCode
    468384
  • Title

    Utility Based Q-learning to Maintain Cooperation in Prisoner´s Dilemma Games

  • Author

    Moriyama, Koichi

  • Author_Institution
    Osaka Univ., Osaka
  • fYear
    2007
  • fDate
    2-5 Nov. 2007
  • Firstpage
    146
  • Lastpage
    152
  • Abstract
    This work deals with Q-learning in a multiagent environment. There are many multiagent Q-learning methods, and most of them aim to converge to a Nash equilibrium, which is not desirable in games like the prisoner´s dilemma (PD). However, normal Q-learning agents that use a stochastic method in choosing actions to avoid local optima may bring mutual cooperation in PD. Although such mutual cooperation usually occurs singly, it can be maintained if the Q- function of cooperation becomes larger than that of defection after the cooperation. This work derives a theorem on how many times the cooperation is needed to make the Q- function of cooperation larger than that of defection. In addition, from the perspective of the author´s previous works that discriminate utilities from rewards and use utilities for learning in PD, this work also derives a corollary on how much utility is necessary to make the Q-function larger by one-shot mutual cooperation.
  • Keywords
    game theory; learning (artificial intelligence); multi-agent systems; Nash equilibrium; multiagent Q-learning methods; multiagent environment; prisoner´s dilemma; stochastic method; utility based Q-learning; Game theory; Intelligent agent; Learning systems; Machine learning; Multiagent systems; Nash equilibrium; Stochastic processes; Toy industry;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Agent Technology, 2007. IAT '07. IEEE/WIC/ACM International Conference on
  • Conference_Location
    Fremont, CA
  • Print_ISBN
    978-0-7695-3027-7
  • Type

    conf

  • DOI
    10.1109/IAT.2007.60
  • Filename
    4407275