• DocumentCode
    3028797
  • Title

    Optimal learning with non-Gaussian rewards

  • Author

    Zi Ding ; Ryzhov, Ilya O.

  • Author_Institution
    Dept. of Math., Univ. of Maryland, College Park, MD, USA
  • fYear
    2013
  • fDate
    8-11 Dec. 2013
  • Firstpage
    631
  • Lastpage
    642
  • Abstract
    We propose a theoretical and computational framework for approximating the optimal policy in multi-armed bandit problems where the reward distributions are non-Gaussian. We first construct a probabilistic interpolation of the sequence of discrete-time rewards in the form of a continuous-time conditional Lévy process. In the Gaussian setting, this approach allows an easy connection to Brownian motion and its convenient time-change properties. No such device is available for non-Gaussian rewards; however, we show how optimal stopping theory can be used to characterize the value of the optimal policy, using a free-boundary partial integro-differential equation, for exponential and Poisson rewards. We then solve this problem numerically to approximate the set of belief states possessing a given optimal index value, and provide illustrations showing that the solution behaves as expected.
  • Keywords
    Brownian motion; integro-differential equations; interpolation; statistical distributions; stochastic processes; Brownian motion; Gaussian setting; Poisson rewards; computational framework; continuous-time conditional Lévy process; discrete-time rewards; exponential rewards; free-boundary partial integro-differential equation; multiarmed bandit problems; nonGaussian rewards; optimal index value; optimal learning; probabilistic interpolation; reward distributions; time-change properties; Educational institutions; Equations; Indexes; Interpolation; Mathematical model; Probabilistic logic;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Simulation Conference (WSC), 2013 Winter
  • Conference_Location
    Washington, DC
  • Print_ISBN
    978-1-4799-2077-8
  • Type

    conf

  • DOI
    10.1109/WSC.2013.6721457
  • Filename
    6721457