DocumentCode :
2222256
Title :
Risk-sensitivity through multi-objective reinforcement learning
Author :
Van Moffaert, Kristof ; Brys, Tim ; Nowe, Ann
Author_Institution :
Department of Computer Science, Vrije Universiteit Brussel, Brussels, Belgium
fYear :
2015
fDate :
25-28 May 2015
Firstpage :
1746
Lastpage :
1753
Abstract :
Usually in reinforcement learning, the goal of the agent is to maximize the expected return. However, in practical applications, algorithms that solely focus on maximizing the mean return could be inappropriate as they do not account for the variability of their solutions. Thereby, a variability measure could be included to accommodate for a risk-sensitive setting, i.e. where the system engineer can explicitly define the tolerated level of variance. Our approach is based on multi-objectivization where a standard single-objective environment is extended with one (or more) additional objectives. More precisely, we augment the standard feedback signal of an environment with an additional objective that defines the variance of the solution. We highlight that our algorithm, named risk-sensitive Pareto Q-learning, is (1) specifically tailored to learn a set of Pareto non-dominated policies that trade-off these two objectives. Additionally (2), the algorithm can also retrieve every policy that has been learned throughout the state-action space. This in contrast to standard risk-sensitive approaches where only a single trade-off between mean and variance is learned at a time.
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Evolutionary Computation (CEC), 2015 IEEE Congress on
Conference_Location :
Sendai, Japan
Type :
conf
DOI :
10.1109/CEC.2015.7257098
Filename :
7257098
Link To Document :
بازگشت