• DocumentCode
    2043673
  • Title

    Achieving complete learning in Multi-Armed Bandit problems

  • Author

    Vakili, Sattar ; Qing Zhao

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of California, Davis, Davis, CA, USA
  • fYear
    2013
  • fDate
    3-6 Nov. 2013
  • Firstpage
    1778
  • Lastpage
    1782
  • Abstract
    In the classic Multi-Armed Bandit (MAB) problem, there is a given set of arms with unknown reward distributions. At each time, a player selects one arm to play, aiming to maximize the total expected reward over a horizon of length T. It is known that the minimum growth rate of regret (defined as the total expected loss with respect to the ideal scenario of known reward models of all arms) is logarithmic with T. In other words, mistakes in selecting suboptimal arms occur infinitely often, and the player will never converge to the arm with the largest reward mean. In this paper, we are interested in the questions that whether side information on the reward model can lead to bounded regret, thus, complete learning, and what is the minimum side information to achieve complete learning. We show that the knowledge of a value η between the largest and the second largest reward mean (among all arms) leads to complete learning by constructing an online learning policy with bounded regret. This result applies to both light-tailed and heavy-tailed reward distributions.
  • Keywords
    game theory; learning (artificial intelligence); statistical distributions; MAB problem; bounded regret; complete learning; heavy-tailed reward distribution; light-tailed reward distribution; multiarmed bandit problem; online learning policy; regret minimum growth rate; reward mean; reward model; side information; total expected reward; Computers; Educational institutions; Random variables; Round robin; Sequential analysis; Upper bound;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signals, Systems and Computers, 2013 Asilomar Conference on
  • Conference_Location
    Pacific Grove, CA
  • Print_ISBN
    978-1-4799-2388-5
  • Type

    conf

  • DOI
    10.1109/ACSSC.2013.6810607
  • Filename
    6810607