DocumentCode
3588075
Title
Time-varying stochastic multi-armed bandit problems
Author
Vakili, Sattar ; Qing Zhao ; Yuan Zhou
Author_Institution
Dept. of Electr. & Comput. Eng., Univ. of California, Davis, Davis, CA, USA
fYear
2014
Firstpage
2103
Lastpage
2107
Abstract
In this paper, we consider a time-varying stochastic multi-armed bandit (MAB) problem where the unknown reward distribution of each arm can change arbitrarily over time. We obtain a lower bound on the regret order and demonstrate that an online learning algorithm achieves this lower bound. We further consider a piece-wise stationary model of the arm reward distributions and establish the regret performance of an online learning algorithm in terms of the number of change points experienced by the reward distributions over the time horizon.
Keywords
game theory; learning (artificial intelligence); stochastic processes; MAB problem; arm reward distribution; online learning algorithm; piece-wise stationary model; time-varying stochastic multiarmed bandit problems; unknown reward distribution; Computers; Current measurement; Length measurement; Partitioning algorithms; Stochastic processes; Switches; Time measurement;
fLanguage
English
Publisher
ieee
Conference_Titel
Signals, Systems and Computers, 2014 48th Asilomar Conference on
Print_ISBN
978-1-4799-8295-0
Type
conf
DOI
10.1109/ACSSC.2014.7094845
Filename
7094845
Link To Document