مرکز منطقه ای اطلاع رساني علوم و فناوري - On effectiveness of the Mirror Decent Algorithm for a stochastic multi-armed bandit governed by a stationary finite Markov chain

DocumentCode :

664276

Title :

On effectiveness of the Mirror Decent Algorithm for a stochastic multi-armed bandit governed by a stationary finite Markov chain

Author :

Nazin, Alexander ; Miller, B.

Author_Institution :

Lab. for Adaptive & Robust Control Syst., V.A. Trapeznikov Inst. of Control Sci., Russia

fYear :

2013

fDate :

4-5 Nov. 2013

Firstpage :

244

Lastpage :

250

Abstract :

In this article, we study the effectiveness of the Mirror Descent Randomized Control Algorithm recently developed to a class of homogeneous finite Markov chains governed by the stochastic multi-armed bandit with unknown mean losses. We prove the explicit, non-asymptotic both upper and lower bounds for the mean losses at a given (finite) time horizon. These bounds are very similar as functions of problem parameters and time horizon, but with different logarithmic term and absolute constant. Numerical example illustrates theoretical results.

Keywords :

Markov processes; randomised algorithms; absolute constant; finite time horizon; homogeneous finite Markov chains; logarithmic term; mirror descent randomized control algorithm; stationary finite Markov chain; stochastic multiarmed bandit; Convergence; Markov processes; Mirrors; Optimal control; Optimization; Upper bound;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Control Conference (AUCC), 2013 3rd Australian

Conference_Location :

Fremantle, WA

Print_ISBN :

978-1-4799-2497-4

Type :

conf

DOI :

10.1109/AUCC.2013.6697280

Filename :

6697280

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=664276