Title :
Decentralized multi-armed bandit with imperfect observations
Author :
Liu, Keqin ; Zhao, Qing ; Krishnamachari, Bhaskar
Author_Institution :
Univ. of California, Davis, CA, USA
fDate :
Sept. 29 2010-Oct. 1 2010
Abstract :
We consider decentralized multi-armed bandit problems with multiple distributed players. At each time, each player chooses one of the N independent arms with unknown reward statistics to play. Players do not exchange information regarding their observations or actions. A collision occurs when multiple players choose the same arm. In this case, the reward obtained by a player involved in the collision deviates from the actual reward offered by this arm in an arbitrary and unknown way, thus making it harder to learn the underlying reward statistic of this arm. The objective is to design a decentralized arm selection policy to minimize the system regret defined as the total reward loss with respect to the ideal scenario of known reward model and centralized scheduling among players. We propose a decentralized policy that achieves O(√T) system regret where T is the length of the time horizon. The policy thus achieves the same maximum average reward as in the ideal scenario. Furthermore, the policy ensures fairness among players, i.e., players achieve the same local average reward at the same rate. These problems find applications in cognitive radio networks, multi-agent systems, Internet advertising and web search.
Keywords :
game theory; statistical analysis; Internet advertising; Web search; centralized scheduling; cognitive radio network; decentralized arm selection policy; decentralized multiarmed bandit problem; maximum average reward; multiagent system; multiple distributed players; reward statistics; system regret; total reward loss; Advertising; Cognitive radio; History; Internet; Multiagent systems; Radio transmitters; USA Councils;
Conference_Titel :
Communication, Control, and Computing (Allerton), 2010 48th Annual Allerton Conference on
Conference_Location :
Allerton, IL
Print_ISBN :
978-1-4244-8215-3
DOI :
10.1109/ALLERTON.2010.5707117