DocumentCode :
3703553
Title :
EXP3 with drift detection for the switching bandit problem
Author :
Robin Allesiardo;Rapha?l F?raud
Author_Institution :
Orange Labs, 22300 Lannion, France
fYear :
2015
Firstpage :
1
Lastpage :
7
Abstract :
The multi-armed bandit is a model of exploration and exploitation, where one must select, within a finite set of arms, the one which maximizes the cumulative reward up to the time horizon T. For the adversarial multi-armed bandit problem, where the sequence of rewards is chosen by an oblivious adversary, the notion of best arm during the time horizon is too restrictive for applications such as ad-serving, where the best ad could change during time range. In this paper, we consider a variant of the adversarial multi-armed bandit problem, where the time horizon is divided into unknown time periods within which rewards are drawn from stochastic distributions. During each time period, there is an optimal arm which may be different from the optimal arm at the previous time period. We present an algorithm taking advantage of the constant exploration of EXP3 to detect when the best arm changes. Its analysis shows that on a run divided into N periods where the best arm changes, the proposed algorithms achieves a regret in O(N √T log T).
Keywords :
"Algorithm design and analysis","Detectors","Switches","Games","Stochastic processes","Time measurement","Monitoring"
Publisher :
ieee
Conference_Titel :
Data Science and Advanced Analytics (DSAA), 2015. 36678 2015. IEEE International Conference on
Print_ISBN :
978-1-4673-8272-4
Type :
conf
DOI :
10.1109/DSAA.2015.7344834
Filename :
7344834
Link To Document :
بازگشت