مرکز منطقه ای اطلاع رساني علوم و فناوري - Dynamic bandit with covariates: Strategic solutions with application to wireless resource allocation

DocumentCode :

2095353

Title :

Dynamic bandit with covariates: Strategic solutions with application to wireless resource allocation

Author :

Maghsudi, Setareh ; Stanaczak, Slawomir

Author_Institution :

Heinrich-Hertz-Lehrstuhl fur Informationstheorie und Theor. Informationsthechnik, Tech. Univ. Berlin, Berlin, Germany

fYear :

2013

fDate :

9-13 June 2013

Firstpage :

5898

Lastpage :

5902

Abstract :

Multi-armed bandit (MAB) problems form a class of sequential optimization problems, in which a player sequentially pulls an arm, selected from a known and finite set of arms, in order to achieve an initially unknown reward. The player aims at maximizing the accumulated reward over a predefined game horizon. Clearly, in bandit setting, a dilemma appears between pushing the currently most promising arm, i.e. the arm with the highest empirical mean reward, on the one hand (exploitation) and on the other sampling arms in order to improve the estimation of the reward generating processes of arms (exploration). In this paper we study a specific subset of MAB problems, namely stochastic covariate bandits, where it is assumed that the series of instantaneous rewards generated by each arm can be attributed to a specific distribution, and that some side information (covariate) is revealed to the player at the beginning of each game trial. In this setting, we address the exploitation-exploration dilemma by proposing two strategies for arm selection (allocation rule). Provided that the underlying regression process is trust-worthy, the proposed strategies are strongly consistent, in the sense that the accumulated reward is equivalent to that based on the best arm, asymptotically almost surely. Further, it is illustrated that the covariate bandit model and our allocation strategies are applicable to wireless networking scenarios by considering the relay selection problem as case study.

Keywords :

covariance analysis; optimisation; regression analysis; relay networks (telecommunication); resource allocation; MAB problems; arm selection; covariate bandit model; dynamic bandit; exploitation-exploration dilemma; game horizon; game trial; multi-armed bandit problems; regression process; relay selection; reward generating processes; sampling arms; sequential optimization problems; stochastic covariate bandits; strategic solutions; wireless networking scenarios; wireless resource allocation; Games; Radiation detectors; Relays; Resource management; Throughput; Wireless communication; Wireless sensor networks; Bandit theory; change-point detection; randomization; regression; relay selection; wireless network;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Communications (ICC), 2013 IEEE International Conference on

Conference_Location :

Budapest

ISSN :

1550-3607

Type :

conf

DOI :

10.1109/ICC.2013.6655540

Filename :

6655540

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2095353