DocumentCode :
714186
Title :
An estimation based allocation rule with super-linear regret and finite lock-on time for time-dependent multi-armed bandit processes
Author :
Prokopiou, Prokopis C. ; Caines, Peter E. ; Mahajan, Aditya
Author_Institution :
Integrated Program in Neurosci., McGill Univ., Montreal, QC, Canada
fYear :
2015
fDate :
3-6 May 2015
Firstpage :
1299
Lastpage :
1306
Abstract :
The multi-armed bandit (MAB) problem has been an active area of research since the early 1930s. The majority of the literature restricts attention to i.i.d. or Markov reward processes. In this paper, the finite-parameter MAB problem with time-dependent reward processes is investigated. An upper confidence bound (UCB) based index policy, where the index is computed based on the maximum-likelihood estimate of the unknown parameter, is proposed. This policy locks on to the optimal arm in finite expected time but has a super-linear regret. As an example, the proposed index policy is used for minimizing prediction error when each arm is a auto-regressive moving average (ARMA) process.
Keywords :
Markov processes; autoregressive moving average processes; maximum likelihood estimation; resource allocation; ARMA process; Markov reward process; UCB based index policy; auto-regressive moving average process; estimation based allocation rule; finite expected time; finite lock-on time; finite-parameter MAB problem; maximum-likelihood estimation; prediction error minimizaion; superlinear regret; time-dependent multiarmed bandit process; time-dependent reward process; upper confidence bound based index policy; Computers; Indexes; Markov processes; Maximum likelihood estimation; Resource management; Technological innovation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Electrical and Computer Engineering (CCECE), 2015 IEEE 28th Canadian Conference on
Conference_Location :
Halifax, NS
ISSN :
0840-7789
Print_ISBN :
978-1-4799-5827-6
Type :
conf
DOI :
10.1109/CCECE.2015.7129466
Filename :
7129466
Link To Document :
بازگشت