Title :
On effectiveness of the Mirror Decent Algorithm for a stochastic multi-armed bandit governed by a stationary finite Markov chain
Author :
Nazin, Alexander ; Miller, B.
Author_Institution :
Lab. for Adaptive & Robust Control Syst., V.A. Trapeznikov Inst. of Control Sci., Russia
Abstract :
In this article, we study the effectiveness of the Mirror Descent Randomized Control Algorithm recently developed to a class of homogeneous finite Markov chains governed by the stochastic multi-armed bandit with unknown mean losses. We prove the explicit, non-asymptotic both upper and lower bounds for the mean losses at a given (finite) time horizon. These bounds are very similar as functions of problem parameters and time horizon, but with different logarithmic term and absolute constant. Numerical example illustrates theoretical results.
Keywords :
Markov processes; randomised algorithms; absolute constant; finite time horizon; homogeneous finite Markov chains; logarithmic term; mirror descent randomized control algorithm; stationary finite Markov chain; stochastic multiarmed bandit; Convergence; Markov processes; Mirrors; Optimal control; Optimization; Upper bound;
Conference_Titel :
Control Conference (AUCC), 2013 3rd Australian
Conference_Location :
Fremantle, WA
Print_ISBN :
978-1-4799-2497-4
DOI :
10.1109/AUCC.2013.6697280