• DocumentCode
    2095353
  • Title

    Dynamic bandit with covariates: Strategic solutions with application to wireless resource allocation

  • Author

    Maghsudi, Setareh ; Stanaczak, Slawomir

  • Author_Institution
    Heinrich-Hertz-Lehrstuhl fur Informationstheorie und Theor. Informationsthechnik, Tech. Univ. Berlin, Berlin, Germany
  • fYear
    2013
  • fDate
    9-13 June 2013
  • Firstpage
    5898
  • Lastpage
    5902
  • Abstract
    Multi-armed bandit (MAB) problems form a class of sequential optimization problems, in which a player sequentially pulls an arm, selected from a known and finite set of arms, in order to achieve an initially unknown reward. The player aims at maximizing the accumulated reward over a predefined game horizon. Clearly, in bandit setting, a dilemma appears between pushing the currently most promising arm, i.e. the arm with the highest empirical mean reward, on the one hand (exploitation) and on the other sampling arms in order to improve the estimation of the reward generating processes of arms (exploration). In this paper we study a specific subset of MAB problems, namely stochastic covariate bandits, where it is assumed that the series of instantaneous rewards generated by each arm can be attributed to a specific distribution, and that some side information (covariate) is revealed to the player at the beginning of each game trial. In this setting, we address the exploitation-exploration dilemma by proposing two strategies for arm selection (allocation rule). Provided that the underlying regression process is trust-worthy, the proposed strategies are strongly consistent, in the sense that the accumulated reward is equivalent to that based on the best arm, asymptotically almost surely. Further, it is illustrated that the covariate bandit model and our allocation strategies are applicable to wireless networking scenarios by considering the relay selection problem as case study.
  • Keywords
    covariance analysis; optimisation; regression analysis; relay networks (telecommunication); resource allocation; MAB problems; arm selection; covariate bandit model; dynamic bandit; exploitation-exploration dilemma; game horizon; game trial; multi-armed bandit problems; regression process; relay selection; reward generating processes; sampling arms; sequential optimization problems; stochastic covariate bandits; strategic solutions; wireless networking scenarios; wireless resource allocation; Games; Radiation detectors; Relays; Resource management; Throughput; Wireless communication; Wireless sensor networks; Bandit theory; change-point detection; randomization; regression; relay selection; wireless network;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Communications (ICC), 2013 IEEE International Conference on
  • Conference_Location
    Budapest
  • ISSN
    1550-3607
  • Type

    conf

  • DOI
    10.1109/ICC.2013.6655540
  • Filename
    6655540