مرکز منطقه ای اطلاع رساني علوم و فناوري - Optimal learning with non-Gaussian rewards

DocumentCode :

3028797

Title :

Optimal learning with non-Gaussian rewards

Author :

Zi Ding ; Ryzhov, Ilya O.

Author_Institution :

Dept. of Math., Univ. of Maryland, College Park, MD, USA

fYear :

2013

fDate :

8-11 Dec. 2013

Firstpage :

631

Lastpage :

642

Abstract :

We propose a theoretical and computational framework for approximating the optimal policy in multi-armed bandit problems where the reward distributions are non-Gaussian. We first construct a probabilistic interpolation of the sequence of discrete-time rewards in the form of a continuous-time conditional Lévy process. In the Gaussian setting, this approach allows an easy connection to Brownian motion and its convenient time-change properties. No such device is available for non-Gaussian rewards; however, we show how optimal stopping theory can be used to characterize the value of the optimal policy, using a free-boundary partial integro-differential equation, for exponential and Poisson rewards. We then solve this problem numerically to approximate the set of belief states possessing a given optimal index value, and provide illustrations showing that the solution behaves as expected.

Keywords :

Brownian motion; integro-differential equations; interpolation; statistical distributions; stochastic processes; Brownian motion; Gaussian setting; Poisson rewards; computational framework; continuous-time conditional Lévy process; discrete-time rewards; exponential rewards; free-boundary partial integro-differential equation; multiarmed bandit problems; nonGaussian rewards; optimal index value; optimal learning; probabilistic interpolation; reward distributions; time-change properties; Educational institutions; Equations; Indexes; Interpolation; Mathematical model; Probabilistic logic;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Simulation Conference (WSC), 2013 Winter

Conference_Location :

Washington, DC

Print_ISBN :

978-1-4799-2077-8

Type :

conf

DOI :

10.1109/WSC.2013.6721457

Filename :

6721457

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3028797