مرکز منطقه ای اطلاع رساني علوم و فناوري - Mirror decent algorithm for a multi-armed bandit governed by a stationary finite state Markov chain

DocumentCode :

3583319

Title :

Mirror decent algorithm for a multi-armed bandit governed by a stationary finite state Markov chain

Author :

Nazin, Alexander ; Miller, B.

Author_Institution :

Lab. for Adaptive & Robust Control Syst., Inst. of Control Sci., Moscow, Russia

fYear :

2013

Firstpage :

371

Lastpage :

375

Abstract :

This article further develops an adaptive approach to the control of observable Markov chains with a finite number of states. We apply the Mirror Descent Randomized Control Algorithm (MDRCA) to a class of homogeneous finite Markov chains governed by the multi-armed bandit with unknown mean losses. The article develops the approach represented in [18]. As opposed to the partially observable Markov decision process an adaptive approach does not presuppose the knowledge of probabilistic characteristics of random perturbations and permits to obtain the control strategy with known rate of convergence to the optimal solution. We propose the concrete MDRCA and prove the explicit, non-asymptotic upper bound for the mean losses at a given (finite) time horizon. Numerical example illustrates theoretical results.

Keywords :

Markov processes; adaptive control; finite state machines; optimal control; randomised algorithms; MDRCA; control strategy; homogeneous finite Markov chains; mirror decent algorithm; mirror descent randomized control algorithm; multiarmed bandit; partially observable Markov decision process; stationary finite state Markov chain; Convergence; Internet; Markov processes; Mirrors; Optimal control; Upper bound; Vectors;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Control Conference (ECC), 2013 European

Type :

conf

Filename :

6669310

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3583319