مرکز منطقه ای اطلاع رساني علوم و فناوري - Extension of the multi-armed bandit problem

DocumentCode :

3054010

Title :

Extension of the multi-armed bandit problem

Author :

Varaiya, P. ; Walrand, J. ; Buyukkoc, C.

Author_Institution :

University of California, Berkeley, CA

fYear :

1983

fDate :

- Dec. 1983

Firstpage :

1179

Lastpage :

1180

Abstract :

There are N independent machines. Machine i is described by a sequence {Xi(s), Fi(s)} where xi(s) is the immediate reward and Fi(s) is the information available before i is operated for the sth time. At each time one operates exactly one machine; idle machines remain frozen. The problem is to schedule the operation of the machines so as to maximize the expected total discounted sequence of rewards. The main result is that to each machine is associated an index, and the optimal policy operates the machine with the largest current index.

Keywords :

Laboratories;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Decision and Control, 1983. The 22nd IEEE Conference on

Conference_Location :

San Antonio, TX, USA

Type :

conf

DOI :

10.1109/CDC.1983.269708

Filename :

4047739

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3054010