Risk-averse online learning under mean-variance measures

Author

Vakili, Sattar ; Qing Zhao

Author_Institution

Dept. of Electr. & Comput. Eng., Univ. of California, Davis, Davis, CA, USA

fYear

2015

fDate

19-24 April 2015

Firstpage

1911

Lastpage

1915

Abstract

We study risk-averse multi-armed bandit problems under mean-variance measures. We consider two risk mitigation models. In the first model, the variations in the reward values obtained at different times are considered as risk and the objective is to minimize the mean-variance of the observed rewards. In the second model, the quantity of interest is the total reward at the end of the time horizon and the objective is to minimize the mean-variance of the total reward. Under both models, we establish asymptotic as well as finite-time lower bounds on regret and develop online learning a time horizon algorithms that achieve the lower bounds.

Keywords

learning (artificial intelligence); minimisation; risk analysis; finite time lower bound; mean variance measure; mean variance minimisation; online learning; risk averse multi-armed bandit problem; risk mitigation model; time horizon; Multi-armed bandit; mean-variance; regret; risk-aversion;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on

Conference_Location

South Brisbane, QLD

Type

conf

DOI

10.1109/ICASSP.2015.7178303

Filename

7178303

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=730314