• DocumentCode
    849453
  • Title

    Extensions of the multiarmed bandit problem: The discounted case

  • Author

    Varaiya, Pravin P. ; Walrand, Jean C. ; Buyukkoc, Cagatay

  • Author_Institution
    University of California, Berkeley, CA, USA
  • Volume
    30
  • Issue
    5
  • fYear
    1985
  • fDate
    5/1/1985 12:00:00 AM
  • Firstpage
    426
  • Lastpage
    439
  • Abstract
    There are N independent machines. Machine i is described by a sequence {X^{i}(s), F^{i}(s)} where X^{i}(s) is the immediate reward and F^{i}(s) is the information available before i is operated for the sth lime. At each time one operates exacfiy one machine; idle machines remain frozen. The problem is to schedule the operation of the machines so as to maximize the expected total discounted sequence of rewards. An elementary proof shows that to each machine is associated an index, and the optimal policy operates the machine with the largest current index. When the machines are completely observed Markov chains, this coincides with the well-known Gittins index rule, and new algorithms are given for calculating the index. A reformulation of the bandit problem yields the tax problem, which includes, as a special case, Klimov\´s waiting time problem. Using the concept of superprocess, an index rule is derived for the case where new machines arrive randomly. Finally, continuous time versions of these problems are considered for both preemptive and nonpreemptive disciplines.
  • Keywords
    Markov processes; Optimal stochastic control; Scheduling; Stochastic optimal control; Dynamic programming; Dynamic scheduling; Laboratories; Network address translation;
  • fLanguage
    English
  • Journal_Title
    Automatic Control, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9286
  • Type

    jour

  • DOI
    10.1109/TAC.1985.1103989
  • Filename
    1103989