مرکز منطقه ای اطلاع رساني علوم و فناوري - EXP3 with drift detection for the switching bandit problem

DocumentCode :

3703553

Title :

EXP3 with drift detection for the switching bandit problem

Author :

Robin Allesiardo;Rapha?l F?raud

Author_Institution :

Orange Labs, 22300 Lannion, France

fYear :

2015

Firstpage :

Lastpage :

Abstract :

The multi-armed bandit is a model of exploration and exploitation, where one must select, within a finite set of arms, the one which maximizes the cumulative reward up to the time horizon T. For the adversarial multi-armed bandit problem, where the sequence of rewards is chosen by an oblivious adversary, the notion of best arm during the time horizon is too restrictive for applications such as ad-serving, where the best ad could change during time range. In this paper, we consider a variant of the adversarial multi-armed bandit problem, where the time horizon is divided into unknown time periods within which rewards are drawn from stochastic distributions. During each time period, there is an optimal arm which may be different from the optimal arm at the previous time period. We present an algorithm taking advantage of the constant exploration of EXP3 to detect when the best arm changes. Its analysis shows that on a run divided into N periods where the best arm changes, the proposed algorithms achieves a regret in O(N √T log T).

Keywords :

"Algorithm design and analysis","Detectors","Switches","Games","Stochastic processes","Time measurement","Monitoring"

Publisher :

ieee

Conference_Titel :

Data Science and Advanced Analytics (DSAA), 2015. 36678 2015. IEEE International Conference on

Print_ISBN :

978-1-4673-8272-4

Type :

conf

DOI :

10.1109/DSAA.2015.7344834

Filename :

7344834

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3703553