DocumentCode :
664276
Title :
On effectiveness of the Mirror Decent Algorithm for a stochastic multi-armed bandit governed by a stationary finite Markov chain
Author :
Nazin, Alexander ; Miller, B.
Author_Institution :
Lab. for Adaptive & Robust Control Syst., V.A. Trapeznikov Inst. of Control Sci., Russia
fYear :
2013
fDate :
4-5 Nov. 2013
Firstpage :
244
Lastpage :
250
Abstract :
In this article, we study the effectiveness of the Mirror Descent Randomized Control Algorithm recently developed to a class of homogeneous finite Markov chains governed by the stochastic multi-armed bandit with unknown mean losses. We prove the explicit, non-asymptotic both upper and lower bounds for the mean losses at a given (finite) time horizon. These bounds are very similar as functions of problem parameters and time horizon, but with different logarithmic term and absolute constant. Numerical example illustrates theoretical results.
Keywords :
Markov processes; randomised algorithms; absolute constant; finite time horizon; homogeneous finite Markov chains; logarithmic term; mirror descent randomized control algorithm; stationary finite Markov chain; stochastic multiarmed bandit; Convergence; Markov processes; Mirrors; Optimal control; Optimization; Upper bound;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Control Conference (AUCC), 2013 3rd Australian
Conference_Location :
Fremantle, WA
Print_ISBN :
978-1-4799-2497-4
Type :
conf
DOI :
10.1109/AUCC.2013.6697280
Filename :
6697280
Link To Document :
بازگشت