مرکز منطقه ای اطلاع رساني علوم و فناوري - Beyond exponential utility functions: A variance-adjusted approach for risk-averse reinforcement learning

DocumentCode :

1799349

Title :

Beyond exponential utility functions: A variance-adjusted approach for risk-averse reinforcement learning

Author :

Gosavi, Abhijit A. ; Das, Sajal K. ; Murray, Susan L.

Author_Institution :

Dept. of Eng. Manage. & Syst. Eng., Missouri Univ. of Sci. & Technol., Rolla, MO, USA

fYear :

2014

fDate :

9-12 Dec. 2014

Firstpage :

Lastpage :

Abstract :

Utility theory has served as a bedrock for modeling risk in economics. Where risk is involved in decision-making, for solving Markov decision processes (MDPs) via utility theory, the exponential utility (EU) function has been used in the literature as an objective function for capturing risk-averse behavior. The EU function framework uses a so-called risk-averseness coefficient (RAC) that seeks to quantify the risk appetite of the decision-maker. Unfortunately, as we show in this paper, the EU framework suffers from computational deficiencies that prevent it from being useful in practice for solution methods based on reinforcement learning (RL). In particular, the value function becomes very large and typically the computer overflows. We provide a simple example to demonstrate this. Further, we show empirically how a variance-adjusted (VA) approach, which approximates the EU function objective for reasonable values of the RAC, can be used in the RL algorithm. The VA framework in a sense has two objectives: maximize expected returns and minimize variance. We conduct empirical studies on a VA-based RL algorithm on the semi-MDP (SMDP), which is a more general version of the MDP. We conclude with a mathematical proof of the boundedness of the iterates in our algorithm.

Keywords :

Markov processes; decision making; economics; learning (artificial intelligence); mathematical analysis; risk analysis; utility theory; EU function; MDP; Markov decision process; RAC; VA approach; decision making; economics; exponential utility functions; mathematical proof; risk-averse reinforcement learning; risk-averseness coefficient; utility theory; variance-adjusted approach; Computers; Equations; Learning (artificial intelligence); Linear programming; Markov processes; Mathematical model; Measurement;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), 2014 IEEE Symposium on

Conference_Location :

Orlando, FL

Type :

conf

DOI :

10.1109/ADPRL.2014.7010645

Filename :

7010645

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1799349