• DocumentCode
    2734757
  • Title

    Using upper confidence bounds for online learning

  • Author

    Auer, Peter

  • Author_Institution
    Inst. for Theor. Comput. Sci., Graz Univ. of Technol., Austria
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    270
  • Lastpage
    279
  • Abstract
    We show how a standard tool from statistics, namely confidence bounds, can be used to elegantly deal with situations which exhibit an exploitation/exploration trade-off. Our technique for designing and analyzing algorithms for such situations is very general and can be applied when an algorithm has to make exploitation-versus-exploration decisions based on uncertain information provided by a random process. We consider two models with such an exploitation/exploration trade-off. For the adversarial bandit problem our new algorithm suffers only O˜(T1/2) regret over T trials which improves significantly over the previously best O˜(T2/3) regret. We also extend our results for the adversarial bandit problem to shifting bandits. The second model we consider is associative reinforcement learning with linear value functions. For this model our technique improves the regret from O˜(T3/4) to O˜(T1/2)
  • Keywords
    learning (artificial intelligence); random processes; statistical analysis; uncertainty handling; adversarial bandit problem; associative reinforcement learning; exploitation decision; exploration decision; linear value functions; online learning; random process; statistics; uncertain information; upper confidence bounds; Algorithm design and analysis; Computer science; Information analysis; Learning; Random processes; Random variables; Statistical analysis; Uncertainty;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Foundations of Computer Science, 2000. Proceedings. 41st Annual Symposium on
  • Conference_Location
    Redondo Beach, CA
  • ISSN
    0272-5428
  • Print_ISBN
    0-7695-0850-2
  • Type

    conf

  • DOI
    10.1109/SFCS.2000.892116
  • Filename
    892116