DocumentCode :
3054010
Title :
Extension of the multi-armed bandit problem
Author :
Varaiya, P. ; Walrand, J. ; Buyukkoc, C.
Author_Institution :
University of California, Berkeley, CA
fYear :
1983
fDate :
- Dec. 1983
Firstpage :
1179
Lastpage :
1180
Abstract :
There are N independent machines. Machine i is described by a sequence {Xi(s), Fi(s)} where xi(s) is the immediate reward and Fi(s) is the information available before i is operated for the sth time. At each time one operates exactly one machine; idle machines remain frozen. The problem is to schedule the operation of the machines so as to maximize the expected total discounted sequence of rewards. The main result is that to each machine is associated an index, and the optimal policy operates the machine with the largest current index.
Keywords :
Laboratories;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Decision and Control, 1983. The 22nd IEEE Conference on
Conference_Location :
San Antonio, TX, USA
Type :
conf
DOI :
10.1109/CDC.1983.269708
Filename :
4047739
Link To Document :
بازگشت