The two-armed-bandit problem with time-invariant finite memory

Author

Cover, Thomas M. ; Hellman, Martin E.

Volume

Issue

fYear

1970

fDate

3/1/1970 12:00:00 AM

Firstpage

185

Lastpage

195

Abstract

This paper solves the classical two-armed-bandit problem under the finite-memory constraint described below. Given are probability densities $p_0$ and $p_1$ , and two experiments $A$ and $B$ . It is not known which density is associated with which experiment. Thus the experimental outcome $Y$ of experiment $A$ is as likely to be distributed according to $p_0$ as it is to be distributed according to $p_1$ . It is desired to sequentially choose an experiment to be performed on the basis of past observations according to the algorithm $T_n = f(T_{n-1}, e_n, Y_n), e_n = e(T_{n-1})$ , where $T_n \\in {1, 2, \\cdots , m}$ is the state of memory at time $n, e_n \\in {A, B}$ is the choice of experiment, and $Y_n$ , is the random variable observation. The goal is to maximize the asymptotic proportion $r$ of uses of the experiment associated with density $p_0$ . Let $l(y) = p_0 (y) / p_1 (y)$ , and let $\\bar{l}$ and $\\bar{\\bar{l}}$ denote the almost everywhere greatest lower bound and least upper bound on $l(y)$ . Let $1 = \\max {\\bar{\\bar{l}}, 1/\\bar{l}}$ . Then the optimal value of $r$ , over all $m$ -state algorithms $(f, e)$ , will be shown to be $l^{m-1} / (l^{m-1} + 1)$ . An $e$ -optimal family of $m$ -state algorithms will be demonstrated. In general, optimal algorithms do not exist, and $e$ -optimal algorithms require artificial randomization.

Keywords

Finite-memory methods; Learning systems; Sequential decision procedures; Stochastic automata; Bayesian methods; H infinity control; Helium; Probability distribution; Random variables; Statistics; Testing; Upper bound;

fLanguage

English

Journal_Title

Information Theory, IEEE Transactions on

Publisher

ieee

ISSN

0018-9448

Type

jour

DOI

10.1109/TIT.1970.1054427

Filename

1054427

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=913562