DocumentCode
468384
Title
Utility Based Q-learning to Maintain Cooperation in Prisoner´s Dilemma Games
Author
Moriyama, Koichi
Author_Institution
Osaka Univ., Osaka
fYear
2007
fDate
2-5 Nov. 2007
Firstpage
146
Lastpage
152
Abstract
This work deals with Q-learning in a multiagent environment. There are many multiagent Q-learning methods, and most of them aim to converge to a Nash equilibrium, which is not desirable in games like the prisoner´s dilemma (PD). However, normal Q-learning agents that use a stochastic method in choosing actions to avoid local optima may bring mutual cooperation in PD. Although such mutual cooperation usually occurs singly, it can be maintained if the Q- function of cooperation becomes larger than that of defection after the cooperation. This work derives a theorem on how many times the cooperation is needed to make the Q- function of cooperation larger than that of defection. In addition, from the perspective of the author´s previous works that discriminate utilities from rewards and use utilities for learning in PD, this work also derives a corollary on how much utility is necessary to make the Q-function larger by one-shot mutual cooperation.
Keywords
game theory; learning (artificial intelligence); multi-agent systems; Nash equilibrium; multiagent Q-learning methods; multiagent environment; prisoner´s dilemma; stochastic method; utility based Q-learning; Game theory; Intelligent agent; Learning systems; Machine learning; Multiagent systems; Nash equilibrium; Stochastic processes; Toy industry;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Agent Technology, 2007. IAT '07. IEEE/WIC/ACM International Conference on
Conference_Location
Fremont, CA
Print_ISBN
978-0-7695-3027-7
Type
conf
DOI
10.1109/IAT.2007.60
Filename
4407275
Link To Document