A lemma on the multiarmed bandit problem

Author

Tsitsiklis, John N.

Author_Institution

Massachusetts Institute of Technology, Cambridge, MA, USA

Volume

Issue

fYear

1986

fDate

6/1/1986 12:00:00 AM

Firstpage

576

Lastpage

577

Abstract

We prove a lemma on the optimal value function for the multiarmed bandit problem which provides a simple direct proof of optimality of writeoff policies. This, in turn, leads to a new proof of optimality of the index rule.

Keywords

Optimal control; Approximation algorithms; Equations; Infinite horizon; Probability distribution; Retirement;

fLanguage

English

Journal_Title

Automatic Control, IEEE Transactions on

Publisher

ieee

ISSN

0018-9286

Type

jour

DOI

10.1109/TAC.1986.1104332

Filename

1104332

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=852905