• DocumentCode
    3588075
  • Title

    Time-varying stochastic multi-armed bandit problems

  • Author

    Vakili, Sattar ; Qing Zhao ; Yuan Zhou

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of California, Davis, Davis, CA, USA
  • fYear
    2014
  • Firstpage
    2103
  • Lastpage
    2107
  • Abstract
    In this paper, we consider a time-varying stochastic multi-armed bandit (MAB) problem where the unknown reward distribution of each arm can change arbitrarily over time. We obtain a lower bound on the regret order and demonstrate that an online learning algorithm achieves this lower bound. We further consider a piece-wise stationary model of the arm reward distributions and establish the regret performance of an online learning algorithm in terms of the number of change points experienced by the reward distributions over the time horizon.
  • Keywords
    game theory; learning (artificial intelligence); stochastic processes; MAB problem; arm reward distribution; online learning algorithm; piece-wise stationary model; time-varying stochastic multiarmed bandit problems; unknown reward distribution; Computers; Current measurement; Length measurement; Partitioning algorithms; Stochastic processes; Switches; Time measurement;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signals, Systems and Computers, 2014 48th Asilomar Conference on
  • Print_ISBN
    978-1-4799-8295-0
  • Type

    conf

  • DOI
    10.1109/ACSSC.2014.7094845
  • Filename
    7094845