• DocumentCode
    1411608
  • Title

    On the value of learning for Bernoulli bandits with unknown parameters

  • Author

    Bhulai, Sandjai ; Koole, Ger

  • Author_Institution
    Vrije Univ., Amsterdam, Netherlands
  • Volume
    45
  • Issue
    11
  • fYear
    2000
  • fDate
    11/1/2000 12:00:00 AM
  • Firstpage
    2135
  • Lastpage
    2140
  • Abstract
    Investigates the multiarmed bandit problem, where each arm generates an infinite sequence of Bernoulli distributed rewards. The parameters of these Bernoulli distributions are unknown and initially assumed to be beta-distributed. Every time a bandit is selected, its beta-distribution is updated to new information in a Bayesian way. The objective is to maximize the long-term discounted rewards. We study the relationship between the necessity of acquiring additional information and the reward. This is done by considering two extreme situations, which occur when a bandit has been played N times: the situation where the decision maker stops learning and the situation where the decision maker acquires full information about that bandit. We show that the difference in reward between this lower and upper bound goes to zero as N grows large.
  • Keywords
    Bayes methods; Markov processes; adaptive control; decision theory; game theory; probability; Bernoulli bandits; Bernoulli distributed rewards; Bernoulli distributions; beta-distribution; decision maker; infinite sequence; long-term discounted rewards; multiarmed bandit problem; Adaptive control; Arm; Bayesian methods; Clinical trials; Closed-form solution; Diseases; Dynamic programming; Equations; Minimax techniques; Upper bound;
  • fLanguage
    English
  • Journal_Title
    Automatic Control, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9286
  • Type

    jour

  • DOI
    10.1109/9.887641
  • Filename
    887641