Title :
Optimal learning with non-Gaussian rewards
Author :
Zi Ding ; Ryzhov, Ilya O.
Author_Institution :
Dept. of Math., Univ. of Maryland, College Park, MD, USA
Abstract :
We propose a theoretical and computational framework for approximating the optimal policy in multi-armed bandit problems where the reward distributions are non-Gaussian. We first construct a probabilistic interpolation of the sequence of discrete-time rewards in the form of a continuous-time conditional Lévy process. In the Gaussian setting, this approach allows an easy connection to Brownian motion and its convenient time-change properties. No such device is available for non-Gaussian rewards; however, we show how optimal stopping theory can be used to characterize the value of the optimal policy, using a free-boundary partial integro-differential equation, for exponential and Poisson rewards. We then solve this problem numerically to approximate the set of belief states possessing a given optimal index value, and provide illustrations showing that the solution behaves as expected.
Keywords :
Brownian motion; integro-differential equations; interpolation; statistical distributions; stochastic processes; Brownian motion; Gaussian setting; Poisson rewards; computational framework; continuous-time conditional Lévy process; discrete-time rewards; exponential rewards; free-boundary partial integro-differential equation; multiarmed bandit problems; nonGaussian rewards; optimal index value; optimal learning; probabilistic interpolation; reward distributions; time-change properties; Educational institutions; Equations; Indexes; Interpolation; Mathematical model; Probabilistic logic;
Conference_Titel :
Simulation Conference (WSC), 2013 Winter
Conference_Location :
Washington, DC
Print_ISBN :
978-1-4799-2077-8
DOI :
10.1109/WSC.2013.6721457