• DocumentCode
    2005741
  • Title

    A Bayesian Learning Automaton for Solving Two-Armed Bernoulli Bandit Problems

  • Author

    Granmo, Ole Christoffer

  • Author_Institution
    Dept. of ICT, Univ. of Agder, Grimstad, Norway
  • fYear
    2008
  • fDate
    11-13 Dec. 2008
  • Firstpage
    23
  • Lastpage
    30
  • Abstract
    The two-armed Bernoulli bandit (TABB) problem is a classical optimization problem where an agent sequentially pulls one of two arms attached to a gambling machine, with each pull resulting either in a reward or a penalty. The reward probabilities of each arm are unknown, and thus one must balance between exploiting existing knowledge about the arms, and obtaining new information. In the last decades, several computationally efficient algorithms for tackling this problem have emerged, with learning automata (LA) being known for their ¿-optimality, and confidence interval based for logarithmically growing regret. Applications include treatment selection in clinical trials, route selection in adaptive routing, and plan exploration in games like Go. The TABB has also been extensively studied from a Bayesian perspective, however, in general, such analysis leads to computationally inefficient solution policies. This paper introduces the Bayesian learning automaton (BLA). The BLA is inherently Bayesian in nature, yet relies simply on counting rewards/penalties and on random sampling from a pair of twin beta distributions. Furthermore, we report that BLA is self-correcting and converges to only pulling the optimal arm with probability 1. Extensive experiments demonstrate that, in contrast to most LA, BLA does not rely on external learning speed/accuracy control. It also outperforms recently proposed confidence interval based algorithms. We thus believe that BLA opens up for improved performance in a number of applications,and that it forms the basis for a new avenue of research.
  • Keywords
    belief networks; learning automata; optimisation; Bayesian learning automaton; optimization problem; twin beta distributions; two-armed Bernoulli bandit problems; Application software; Arm; Artificial intelligence; Bayesian methods; Clinical trials; Learning automata; Machine learning; Resource management; Routing; Sampling methods;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Applications, 2008. ICMLA '08. Seventh International Conference on
  • Conference_Location
    San Diego, CA
  • Print_ISBN
    978-0-7695-3495-4
  • Type

    conf

  • DOI
    10.1109/ICMLA.2008.67
  • Filename
    4724951