• DocumentCode
    730314
  • Title

    Risk-averse online learning under mean-variance measures

  • Author

    Vakili, Sattar ; Qing Zhao

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of California, Davis, Davis, CA, USA
  • fYear
    2015
  • fDate
    19-24 April 2015
  • Firstpage
    1911
  • Lastpage
    1915
  • Abstract
    We study risk-averse multi-armed bandit problems under mean-variance measures. We consider two risk mitigation models. In the first model, the variations in the reward values obtained at different times are considered as risk and the objective is to minimize the mean-variance of the observed rewards. In the second model, the quantity of interest is the total reward at the end of the time horizon and the objective is to minimize the mean-variance of the total reward. Under both models, we establish asymptotic as well as finite-time lower bounds on regret and develop online learning a time horizon algorithms that achieve the lower bounds.
  • Keywords
    learning (artificial intelligence); minimisation; risk analysis; finite time lower bound; mean variance measure; mean variance minimisation; online learning; risk averse multi-armed bandit problem; risk mitigation model; time horizon; Multi-armed bandit; mean-variance; regret; risk-aversion;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
  • Conference_Location
    South Brisbane, QLD
  • Type

    conf

  • DOI
    10.1109/ICASSP.2015.7178303
  • Filename
    7178303