DocumentCode
913562
Title
The two-armed-bandit problem with time-invariant finite memory
Author
Cover, Thomas M. ; Hellman, Martin E.
Volume
16
Issue
2
fYear
1970
fDate
3/1/1970 12:00:00 AM
Firstpage
185
Lastpage
195
Abstract
This paper solves the classical two-armed-bandit problem under the finite-memory constraint described below. Given are probability densities
and
, and two experiments
and
. It is not known which density is associated with which experiment. Thus the experimental outcome
of experiment
is as likely to be distributed according to
as it is to be distributed according to
. It is desired to sequentially choose an experiment to be performed on the basis of past observations according to the algorithm
, where
is the state of memory at time
is the choice of experiment, and
, is the random variable observation. The goal is to maximize the asymptotic proportion
of uses of the experiment associated with density
. Let
, and let
and
denote the almost everywhere greatest lower bound and least upper bound on
. Let
. Then the optimal value of
, over all
-state algorithms
, will be shown to be
. An
-optimal family of
-state algorithms will be demonstrated. In general, optimal algorithms do not exist, and
-optimal algorithms require artificial randomization.
and
, and two experiments
and
. It is not known which density is associated with which experiment. Thus the experimental outcome
of experiment
is as likely to be distributed according to
as it is to be distributed according to
. It is desired to sequentially choose an experiment to be performed on the basis of past observations according to the algorithm
, where
is the state of memory at time
is the choice of experiment, and
, is the random variable observation. The goal is to maximize the asymptotic proportion
of uses of the experiment associated with density
. Let
, and let
and
denote the almost everywhere greatest lower bound and least upper bound on
. Let
. Then the optimal value of
, over all
-state algorithms
, will be shown to be
. An
-optimal family of
-state algorithms will be demonstrated. In general, optimal algorithms do not exist, and
-optimal algorithms require artificial randomization.Keywords
Finite-memory methods; Learning systems; Sequential decision procedures; Stochastic automata; Bayesian methods; H infinity control; Helium; Probability distribution; Random variables; Statistics; Testing; Upper bound;
fLanguage
English
Journal_Title
Information Theory, IEEE Transactions on
Publisher
ieee
ISSN
0018-9448
Type
jour
DOI
10.1109/TIT.1970.1054427
Filename
1054427
Link To Document