• DocumentCode
    1535838
  • Title

    A Structured Multiarmed Bandit Problem and the Greedy Policy

  • Author

    Mersereau, Adam J. ; Rusmevichientong, Paat ; Tsitsiklis, John N.

  • Author_Institution
    Kenan-Flagler Bus. Sch., Univ. of North Carolina, Chapel Hill, NC, USA
  • Volume
    54
  • Issue
    12
  • fYear
    2009
  • Firstpage
    2787
  • Lastpage
    2802
  • Abstract
    We consider a multiarmed bandit problem where the expected reward of each arm is a linear function of an unknown scalar with a prior distribution. The objective is to choose a sequence of arms that maximizes the expected total (or discounted total) reward. We demonstrate the effectiveness of a greedy policy that takes advantage of the known statistical correlation structure among the arms. In the infinite horizon discounted reward setting, we show that the greedy and optimal policies eventually coincide, and both settle on the best arm. This is in contrast with the Incomplete Learning Theorem for the case of independent arms. In the total reward setting, we show that the cumulative Bayes risk after T periods under the greedy policy is at most O(logT), which is smaller than the lower bound of ??(log2 T) established by Lai for a general, but different, class of bandit problems. We also establish the tightness of our bounds. Theoretical and numerical results show that the performance of our policy scales independently of the number of arms.
  • Keywords
    Bayes methods; Markov processes; greedy algorithms; statistical distributions; cumulative Bayes risk; discounted total reward; expected total reward; greedy policy; incomplete learning theorem; infinite horizon discounted reward setting; linear function; prior distribution; statistical correlation structure; structured multiarmed bandit; Arm; Convergence; Costs; Infinite horizon; Laboratories; Operations research; Prototypes; Markov decision process (MDP);
  • fLanguage
    English
  • Journal_Title
    Automatic Control, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9286
  • Type

    jour

  • DOI
    10.1109/TAC.2009.2031725
  • Filename
    5308361