DocumentCode
730314
Title
Risk-averse online learning under mean-variance measures
Author
Vakili, Sattar ; Qing Zhao
Author_Institution
Dept. of Electr. & Comput. Eng., Univ. of California, Davis, Davis, CA, USA
fYear
2015
fDate
19-24 April 2015
Firstpage
1911
Lastpage
1915
Abstract
We study risk-averse multi-armed bandit problems under mean-variance measures. We consider two risk mitigation models. In the first model, the variations in the reward values obtained at different times are considered as risk and the objective is to minimize the mean-variance of the observed rewards. In the second model, the quantity of interest is the total reward at the end of the time horizon and the objective is to minimize the mean-variance of the total reward. Under both models, we establish asymptotic as well as finite-time lower bounds on regret and develop online learning a time horizon algorithms that achieve the lower bounds.
Keywords
learning (artificial intelligence); minimisation; risk analysis; finite time lower bound; mean variance measure; mean variance minimisation; online learning; risk averse multi-armed bandit problem; risk mitigation model; time horizon; Multi-armed bandit; mean-variance; regret; risk-aversion;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location
South Brisbane, QLD
Type
conf
DOI
10.1109/ICASSP.2015.7178303
Filename
7178303
Link To Document