Improving Multi-agent Learners Using Less-Biased Value Estimators

Author

Sherief Abdallah;Michael Kaisers

Author_Institution

Fac. of Eng. &

Volume

2

fYear

2015

Firstpage

120

Lastpage

124

Abstract

Many different value-based or policy-search reinforcement learning algorithms have been applied to multi-agent settings. Value-based learners estimate the expected return (value) for each state-action combination and then derive a policy from these expectations. Policy-search learners optimize the agent´s policy directly by using a parameterized representation of the policy and then optimizing the parameter values to maximize the expected return. While the two classes of algorithms have been considered as contrasting one another, we note that several policy-search algorithms (e.g., Weighted Policy Learner and Infinitesimal Gradient Ascent) need a method for estimating the expected returns. In practice, these policy-search algorithms internally use an update equation for incrementally improving value estimates. In this paper we present the first detailed study of the effect of using different value-based learning algorithms as components of policy-search learners. Our results show that the particular choice can significantly affect performance.

Keywords

"Games","Prediction algorithms","Algorithm design and analysis","Approximation algorithms","Learning (artificial intelligence)","Mathematical model","Electronic mail"

Publisher

ieee

Conference_Titel

Web Intelligence and Intelligent Agent Technology (WI-IAT), 2015 IEEE / WIC / ACM International Conference on

Type

conf

DOI

10.1109/WI-IAT.2015.113

Filename

7397346