DocumentCode :
2095353
Title :
Dynamic bandit with covariates: Strategic solutions with application to wireless resource allocation
Author :
Maghsudi, Setareh ; Stanaczak, Slawomir
Author_Institution :
Heinrich-Hertz-Lehrstuhl fur Informationstheorie und Theor. Informationsthechnik, Tech. Univ. Berlin, Berlin, Germany
fYear :
2013
fDate :
9-13 June 2013
Firstpage :
5898
Lastpage :
5902
Abstract :
Multi-armed bandit (MAB) problems form a class of sequential optimization problems, in which a player sequentially pulls an arm, selected from a known and finite set of arms, in order to achieve an initially unknown reward. The player aims at maximizing the accumulated reward over a predefined game horizon. Clearly, in bandit setting, a dilemma appears between pushing the currently most promising arm, i.e. the arm with the highest empirical mean reward, on the one hand (exploitation) and on the other sampling arms in order to improve the estimation of the reward generating processes of arms (exploration). In this paper we study a specific subset of MAB problems, namely stochastic covariate bandits, where it is assumed that the series of instantaneous rewards generated by each arm can be attributed to a specific distribution, and that some side information (covariate) is revealed to the player at the beginning of each game trial. In this setting, we address the exploitation-exploration dilemma by proposing two strategies for arm selection (allocation rule). Provided that the underlying regression process is trust-worthy, the proposed strategies are strongly consistent, in the sense that the accumulated reward is equivalent to that based on the best arm, asymptotically almost surely. Further, it is illustrated that the covariate bandit model and our allocation strategies are applicable to wireless networking scenarios by considering the relay selection problem as case study.
Keywords :
covariance analysis; optimisation; regression analysis; relay networks (telecommunication); resource allocation; MAB problems; arm selection; covariate bandit model; dynamic bandit; exploitation-exploration dilemma; game horizon; game trial; multi-armed bandit problems; regression process; relay selection; reward generating processes; sampling arms; sequential optimization problems; stochastic covariate bandits; strategic solutions; wireless networking scenarios; wireless resource allocation; Games; Radiation detectors; Relays; Resource management; Throughput; Wireless communication; Wireless sensor networks; Bandit theory; change-point detection; randomization; regression; relay selection; wireless network;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Communications (ICC), 2013 IEEE International Conference on
Conference_Location :
Budapest
ISSN :
1550-3607
Type :
conf
DOI :
10.1109/ICC.2013.6655540
Filename :
6655540
Link To Document :
بازگشت