DocumentCode
1743880
Title
On the value of learning for Bernoulli bandits with unknown parameters
Author
Bhulai, Sandjai ; Koole, Ger
Author_Institution
Dept. of Math. & Comput. Sci., Vrije Univ., Amsterdam, Netherlands
Volume
1
fYear
2000
fDate
2000
Firstpage
736
Abstract
We investigate the multi-armed bandit problem, where each arm generates an infinite sequence of Bernoulli distributed rewards. The parameters of these Bernoulli distributions are unknown and initially assumed to be beta-distributed. Every time a bandit is selected its beta-distribution is updated to new information in a Bayesian way. The objective is to maximize the long term discounted rewards. We study the relationship between the necessity of acquiring additional information and the reward. This is done by considering two extreme situations which occur when a bandit has been played M times; the situation where the decision maker stops learning and the situation where the decision maker acquires full information about that bandit. We show that the difference in reward between this lower and upper bound goes to zero as N grows large
Keywords
Bayes methods; Markov processes; decision theory; game theory; Bernoulli bandits; Bernoulli distributed rewards; beta-distribution; decision maker; long term discounted rewards; multi-armed bandit problem; unknown parameters; Adaptive control; Arm; Bayesian methods; Closed-form solution; Computer science; Dynamic programming; Equations; Mathematics; Minimax techniques; Upper bound;
fLanguage
English
Publisher
ieee
Conference_Titel
Decision and Control, 2000. Proceedings of the 39th IEEE Conference on
Conference_Location
Sydney, NSW
ISSN
0191-2216
Print_ISBN
0-7803-6638-7
Type
conf
DOI
10.1109/CDC.2000.912856
Filename
912856
Link To Document