Learning to Reach Optimal Equilibrium by Influence of Other Agents Opinion

Author

Barrios-Aranibar, Dennis ; Goncalves, Luiz M. G.

Author_Institution

Fed. Univ. of Rio Grande do Norte, Natal

fYear

2007

Firstpage

198

Lastpage

203

Abstract

In this work authors extend the model of the reinforcement learning paradigm for multi-agent systems called "influence value reinforcement learning " (IVRL). In previous work an algorithm for repetitive games was proposed, and it outperformed traditional paradigms. Here, authors define an algorithm based on this paradigm for using when agents has to learn from delayed rewards, thus, an influence value reinforcement learning algorithm for two agents stochastic games. The IVRLparadigm is based on social interaction of people, specially in the fact that people communicate each other what they think about their actions and this opinion has some influence in the behavior of each other. A modified version of Q-learning algorithm using this paradigm was constructed. The so called TV Q-learning algorithm was implemented and compared with versions of Q-learning for independent learning and joint action learning. Our approach shows to have more probability to converge to an optimal equilibrium than IQ-learning and JAQ-learning algorithms, specially when exploration increases.

Keywords

learning (artificial intelligence); multi-robot systems; stochastic games; IQ-learning algorithm; JAQ-learning algorithms; TV Q-learning algorithm; influence value reinforcement learning; multiagent systems; stochastic games; Automation; Collaborative work; Delay; Game theory; Hybrid intelligent systems; Learning; Multiagent systems; Nash equilibrium; Stochastic processes; Testing;

fLanguage

English

Publisher

ieee

Conference_Titel

Hybrid Intelligent Systems, 2007. HIS 2007. 7th International Conference on

Conference_Location

Kaiserlautern

Print_ISBN

978-0-7695-2946-2

Type

conf

DOI

10.1109/HIS.2007.61

Filename

4344051