A Bayesian Learning Automaton for Solving Two-Armed Bernoulli Bandit Problems

Author

Granmo, Ole Christoffer

Author_Institution

Dept. of ICT, Univ. of Agder, Grimstad, Norway

fYear

2008

fDate

11-13 Dec. 2008

Firstpage

23

Lastpage

30

Abstract

The two-armed Bernoulli bandit (TABB) problem is a classical optimization problem where an agent sequentially pulls one of two arms attached to a gambling machine, with each pull resulting either in a reward or a penalty. The reward probabilities of each arm are unknown, and thus one must balance between exploiting existing knowledge about the arms, and obtaining new information. In the last decades, several computationally efficient algorithms for tackling this problem have emerged, with learning automata (LA) being known for their Â¿-optimality, and confidence interval based for logarithmically growing regret. Applications include treatment selection in clinical trials, route selection in adaptive routing, and plan exploration in games like Go. The TABB has also been extensively studied from a Bayesian perspective, however, in general, such analysis leads to computationally inefficient solution policies. This paper introduces the Bayesian learning automaton (BLA). The BLA is inherently Bayesian in nature, yet relies simply on counting rewards/penalties and on random sampling from a pair of twin beta distributions. Furthermore, we report that BLA is self-correcting and converges to only pulling the optimal arm with probability 1. Extensive experiments demonstrate that, in contrast to most LA, BLA does not rely on external learning speed/accuracy control. It also outperforms recently proposed confidence interval based algorithms. We thus believe that BLA opens up for improved performance in a number of applications,and that it forms the basis for a new avenue of research.

Keywords

belief networks; learning automata; optimisation; Bayesian learning automaton; optimization problem; twin beta distributions; two-armed Bernoulli bandit problems; Application software; Arm; Artificial intelligence; Bayesian methods; Clinical trials; Learning automata; Machine learning; Resource management; Routing; Sampling methods;

fLanguage

English

Publisher

ieee

Conference_Titel

Machine Learning and Applications, 2008. ICMLA '08. Seventh International Conference on

Conference_Location

San Diego, CA

Print_ISBN

978-0-7695-3495-4

Type

conf

DOI

10.1109/ICMLA.2008.67

Filename

4724951