• DocumentCode
    2222256
  • Title

    Risk-sensitivity through multi-objective reinforcement learning

  • Author

    Van Moffaert, Kristof ; Brys, Tim ; Nowe, Ann

  • Author_Institution
    Department of Computer Science, Vrije Universiteit Brussel, Brussels, Belgium
  • fYear
    2015
  • fDate
    25-28 May 2015
  • Firstpage
    1746
  • Lastpage
    1753
  • Abstract
    Usually in reinforcement learning, the goal of the agent is to maximize the expected return. However, in practical applications, algorithms that solely focus on maximizing the mean return could be inappropriate as they do not account for the variability of their solutions. Thereby, a variability measure could be included to accommodate for a risk-sensitive setting, i.e. where the system engineer can explicitly define the tolerated level of variance. Our approach is based on multi-objectivization where a standard single-objective environment is extended with one (or more) additional objectives. More precisely, we augment the standard feedback signal of an environment with an additional objective that defines the variance of the solution. We highlight that our algorithm, named risk-sensitive Pareto Q-learning, is (1) specifically tailored to learn a set of Pareto non-dominated policies that trade-off these two objectives. Additionally (2), the algorithm can also retrieve every policy that has been learned throughout the state-action space. This in contrast to standard risk-sensitive approaches where only a single trade-off between mean and variance is learned at a time.
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Evolutionary Computation (CEC), 2015 IEEE Congress on
  • Conference_Location
    Sendai, Japan
  • Type

    conf

  • DOI
    10.1109/CEC.2015.7257098
  • Filename
    7257098