DocumentCode :
2887180
Title :
Adaptive learning of uncontrolled restless bandits with logarithmic regret
Author :
Tekin, Cem ; Liu, Mingyan
Author_Institution :
Dept. of Electr. Eng. & Comput. Sci., Univ. of Michigan, Ann Arbor, MI, USA
fYear :
2011
fDate :
28-30 Sept. 2011
Firstpage :
983
Lastpage :
990
Abstract :
In this paper we consider the problem of learning the optimal policy for the uncontrolled restless bandit problem. In this problem only the state of the selected arm can be observed, the state transitions are independent of control and the transition law is unknown. We propose a learning algorithm which gives logarithmic regret uniformly over time with respect to the optimal finite horizon policy with known transition law under some assumptions on the transition probabilities of the arms and the structure of the optimal stationary policy for the infinite horizon average reward problem.
Keywords :
Markov processes; computational complexity; learning (artificial intelligence); adaptive learning algorithm; infinite horizon average reward problem; logarithmic regret; optimal finite horizon policy; optimal stationary policy; state transition; transition law; transition probability; uncontrolled restless bandit problem; Equations; History; Indexes; Markov processes; Mathematical model; Upper bound; Vectors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Communication, Control, and Computing (Allerton), 2011 49th Annual Allerton Conference on
Conference_Location :
Monticello, IL
Print_ISBN :
978-1-4577-1817-5
Type :
conf
DOI :
10.1109/Allerton.2011.6120273
Filename :
6120273
Link To Document :
بازگشت