مرکز منطقه ای اطلاع رساني علوم و فناوري - Adaptive learning of uncontrolled restless bandits with logarithmic regret

DocumentCode :

2887180

Title :

Adaptive learning of uncontrolled restless bandits with logarithmic regret

Author :

Tekin, Cem ; Liu, Mingyan

Author_Institution :

Dept. of Electr. Eng. & Comput. Sci., Univ. of Michigan, Ann Arbor, MI, USA

fYear :

2011

fDate :

28-30 Sept. 2011

Firstpage :

983

Lastpage :

990

Abstract :

In this paper we consider the problem of learning the optimal policy for the uncontrolled restless bandit problem. In this problem only the state of the selected arm can be observed, the state transitions are independent of control and the transition law is unknown. We propose a learning algorithm which gives logarithmic regret uniformly over time with respect to the optimal finite horizon policy with known transition law under some assumptions on the transition probabilities of the arms and the structure of the optimal stationary policy for the infinite horizon average reward problem.

Keywords :

Markov processes; computational complexity; learning (artificial intelligence); adaptive learning algorithm; infinite horizon average reward problem; logarithmic regret; optimal finite horizon policy; optimal stationary policy; state transition; transition law; transition probability; uncontrolled restless bandit problem; Equations; History; Indexes; Markov processes; Mathematical model; Upper bound; Vectors;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Communication, Control, and Computing (Allerton), 2011 49th Annual Allerton Conference on

Conference_Location :

Monticello, IL

Print_ISBN :

978-1-4577-1817-5

Type :

conf

DOI :

10.1109/Allerton.2011.6120273

Filename :

6120273

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2887180