مرکز منطقه ای اطلاع رساني علوم و فناوري - Weighted Restless Bandit and Its Applications

DocumentCode :

3256556

Title :

Weighted Restless Bandit and Its Applications

Author :

Peng-Jun Wan ; Xiaohua Xu

Author_Institution :

Dept. of Comput. Sci., Illinois Inst. of Technol., Chicago, IL, USA

fYear :

2015

fDate :

June 29 2015-July 2 2015

Firstpage :

507

Lastpage :

516

Abstract :

Motivated by many applications such as cognitive radio spectrum scheduling, downlink fading channel scheduling, and unmanned aerial vehicle dynamic routing, we study two restless bandit problems. Given a bandit consisting of multiple restless arms, the state of each arm evolves as a Markov chain. Assume each arm is associated with a positive weight. At each step, we select a subset of arms to play such that the weighted sum of the selected arms cannot exceed a limit. The reward of playing each arm varies according to the arm´s state. The exact state of each arm is only revealed when the arm is played. The problem weighted restless bandit aims to maximize the expected average reward over the infinite horizon. We also study an extended problem called multiply-constrained restless bandit where each time there are two simultaneous constraints on the selected arms. First, the weighted sum of the selected arms cannot exceed a limit, Second, the number of the selected arms is at most a constant K. The objective of multiply-constrained restless bandit is to maximize the long term average reward. Both problems are partially observable Markov decision processes and have been proved to be PSPACE-hard even in their special cases. We propose constant approximation algorithms for both problems. Our method involves solving a semi-infinite program, converting back to a low-complexity policy, and accounting for the average reward via a Lyapunov function analysis.

Keywords :

Lyapunov methods; Markov processes; approximation theory; decision making; infinite horizon; observability; optimisation; Lyapunov function analysis; Markov chain; PSPACE-hard processes; constant approximation algorithms; expected average reward maximization; infinite horizon; long term average reward maximization; multiply-constrained restless bandit problems; partially observable Markov decision process; semiinfinite program; weighted restless bandit problem; Approximation methods; Downlink; Dynamic scheduling; Fading; Indexes; Markov processes; Routing; Approximation algorithms; multi-armed bandits; restless bandits;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Distributed Computing Systems (ICDCS), 2015 IEEE 35th International Conference on

Conference_Location :

Columbus, OH

ISSN :

1063-6927

Type :

conf

DOI :

10.1109/ICDCS.2015.58

Filename :

7164936

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3256556