Title :
An estimation based allocation rule with super-linear regret and finite lock-on time for time-dependent multi-armed bandit processes
Author :
Prokopiou, Prokopis C. ; Caines, Peter E. ; Mahajan, Aditya
Author_Institution :
Integrated Program in Neurosci., McGill Univ., Montreal, QC, Canada
Abstract :
The multi-armed bandit (MAB) problem has been an active area of research since the early 1930s. The majority of the literature restricts attention to i.i.d. or Markov reward processes. In this paper, the finite-parameter MAB problem with time-dependent reward processes is investigated. An upper confidence bound (UCB) based index policy, where the index is computed based on the maximum-likelihood estimate of the unknown parameter, is proposed. This policy locks on to the optimal arm in finite expected time but has a super-linear regret. As an example, the proposed index policy is used for minimizing prediction error when each arm is a auto-regressive moving average (ARMA) process.
Keywords :
Markov processes; autoregressive moving average processes; maximum likelihood estimation; resource allocation; ARMA process; Markov reward process; UCB based index policy; auto-regressive moving average process; estimation based allocation rule; finite expected time; finite lock-on time; finite-parameter MAB problem; maximum-likelihood estimation; prediction error minimizaion; superlinear regret; time-dependent multiarmed bandit process; time-dependent reward process; upper confidence bound based index policy; Computers; Indexes; Markov processes; Maximum likelihood estimation; Resource management; Technological innovation;
Conference_Titel :
Electrical and Computer Engineering (CCECE), 2015 IEEE 28th Canadian Conference on
Conference_Location :
Halifax, NS
Print_ISBN :
978-1-4799-5827-6
DOI :
10.1109/CCECE.2015.7129466