Utility Based Q-learning to Maintain Cooperation in Prisoner´s Dilemma Games

Author

Moriyama, Koichi

Author_Institution

Osaka Univ., Osaka

fYear

2007

fDate

2-5 Nov. 2007

Firstpage

146

Lastpage

152

Abstract

This work deals with Q-learning in a multiagent environment. There are many multiagent Q-learning methods, and most of them aim to converge to a Nash equilibrium, which is not desirable in games like the prisoner´s dilemma (PD). However, normal Q-learning agents that use a stochastic method in choosing actions to avoid local optima may bring mutual cooperation in PD. Although such mutual cooperation usually occurs singly, it can be maintained if the Q- function of cooperation becomes larger than that of defection after the cooperation. This work derives a theorem on how many times the cooperation is needed to make the Q- function of cooperation larger than that of defection. In addition, from the perspective of the author´s previous works that discriminate utilities from rewards and use utilities for learning in PD, this work also derives a corollary on how much utility is necessary to make the Q-function larger by one-shot mutual cooperation.

Keywords

game theory; learning (artificial intelligence); multi-agent systems; Nash equilibrium; multiagent Q-learning methods; multiagent environment; prisoner´s dilemma; stochastic method; utility based Q-learning; Game theory; Intelligent agent; Learning systems; Machine learning; Multiagent systems; Nash equilibrium; Stochastic processes; Toy industry;

fLanguage

English

Publisher

ieee

Conference_Titel

Intelligent Agent Technology, 2007. IAT '07. IEEE/WIC/ACM International Conference on

Conference_Location

Fremont, CA

Print_ISBN

978-0-7695-3027-7

Type

conf

DOI

10.1109/IAT.2007.60

Filename

4407275