• DocumentCode
    3538692
  • Title

    Multi-armed bandits in the presence of side observations in social networks

  • Author

    Buccapatnam, Swapna ; Eryilmaz, Atilla ; Shroff, Ness B.

  • Author_Institution
    Dept. of Electr. & Comput. Eng. (ECE), Ohio State Univ., Columbus, OH, USA
  • fYear
    2013
  • fDate
    10-13 Dec. 2013
  • Firstpage
    7309
  • Lastpage
    7314
  • Abstract
    We consider the decision problem of an external agent choosing to execute one of M actions for each user in a social network. We assume that observing a user´s actions provides valuable information for a larger set of users since each user´s preferences are interrelated with those of her social peers. This falls into the well-known setting of the multi-armed bandit (MAB) problems, but with the critical new component of side observations resulting from interactions between users. Our contributions in this work are as follows: 1) We model the MAB problem in the presence of side observations and obtain an asymptotic lower bound (as a function of the network structure) on the regret (loss) of any uniformly good policy that achieves the maximum long term average reward. 2) We propose a randomized policy that explores actions for each user at a rate that is a function of her network position. We show that this policy achieves the asymptotic lower bound on regret associated with actions that are unpopular for all the users. 3) We derive an upper bound on the regret of existing Upper Confidence Bound (UCB) policies for MAB problems modified for our setting of side observations. We present case studies to show that these UCB policies are agnostic of the network structure and this causes their regret to suffer in a network setting. Our investigations in this work reveal the significant gains that can be obtained even through static network-aware policies.
  • Keywords
    greedy algorithms; learning (artificial intelligence); linear programming; social networking (online); stochastic processes; ε-greedy-LP policy; MAB problems; UCB policy; asymptotic lower bound; decision problem; external agent; linear programming; maximum long term average reward; multiarmed bandit problem; network structure; randomized policy; side observations; social networks; static network-aware policy; stochastic bandit problem; upper confidence bound policy; user preferences; Advertising; Facebook; Motion pictures; Nickel; Resource management; Upper bound;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Decision and Control (CDC), 2013 IEEE 52nd Annual Conference on
  • Conference_Location
    Firenze
  • ISSN
    0743-1546
  • Print_ISBN
    978-1-4673-5714-2
  • Type

    conf

  • DOI
    10.1109/CDC.2013.6761049
  • Filename
    6761049