Multi-armed bandits in the presence of side observations in social networks

Author

Buccapatnam, Swapna ; Eryilmaz, Atilla ; Shroff, Ness B.

Author_Institution

Dept. of Electr. & Comput. Eng. (ECE), Ohio State Univ., Columbus, OH, USA

fYear

2013

fDate

10-13 Dec. 2013

Firstpage

7309

Lastpage

7314

Abstract

We consider the decision problem of an external agent choosing to execute one of M actions for each user in a social network. We assume that observing a user´s actions provides valuable information for a larger set of users since each user´s preferences are interrelated with those of her social peers. This falls into the well-known setting of the multi-armed bandit (MAB) problems, but with the critical new component of side observations resulting from interactions between users. Our contributions in this work are as follows: 1) We model the MAB problem in the presence of side observations and obtain an asymptotic lower bound (as a function of the network structure) on the regret (loss) of any uniformly good policy that achieves the maximum long term average reward. 2) We propose a randomized policy that explores actions for each user at a rate that is a function of her network position. We show that this policy achieves the asymptotic lower bound on regret associated with actions that are unpopular for all the users. 3) We derive an upper bound on the regret of existing Upper Confidence Bound (UCB) policies for MAB problems modified for our setting of side observations. We present case studies to show that these UCB policies are agnostic of the network structure and this causes their regret to suffer in a network setting. Our investigations in this work reveal the significant gains that can be obtained even through static network-aware policies.

Keywords

greedy algorithms; learning (artificial intelligence); linear programming; social networking (online); stochastic processes; ε-greedy-LP policy; MAB problems; UCB policy; asymptotic lower bound; decision problem; external agent; linear programming; maximum long term average reward; multiarmed bandit problem; network structure; randomized policy; side observations; social networks; static network-aware policy; stochastic bandit problem; upper confidence bound policy; user preferences; Advertising; Facebook; Motion pictures; Nickel; Resource management; Upper bound;

fLanguage

English

Publisher

ieee

Conference_Titel

Decision and Control (CDC), 2013 IEEE 52nd Annual Conference on

Conference_Location

Firenze

ISSN

0743-1546

Print_ISBN

978-1-4673-5714-2

Type

conf

DOI

10.1109/CDC.2013.6761049

Filename

6761049