• DocumentCode
    1799310
  • Title

    Policy gradient approaches for multi-objective sequential decision making: A comparison

  • Author

    Parisi, Simone ; Pirotta, Matteo ; Smacchia, Nicola ; Bascetta, Luca ; Restelli, Marcello

  • Author_Institution
    Dept. of Electron., Inf. & Bioeng., Politec. di Milano, Milan, Italy
  • fYear
    2014
  • fDate
    9-12 Dec. 2014
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    This paper investigates the use of policy gradient techniques to approximate the Pareto frontier in Multi-Objective Markov Decision Processes (MOMDPs). Despite the popularity of policy-gradient algorithms and the fact that gradient-ascent algorithms have been already proposed to numerically solve multi-objective optimization problems, especially in combination with multi-objective evolutionary algorithms, so far little attention has been paid to the use of gradient information to face multi-objective sequential decision problems. Three different Multi-Objective Reinforcement-Learning (MORL) approaches are here presented. The first two, called radial and Pareto following, start from an initial policy and perform gradient-based policy-search procedures aimed at finding a set of non-dominated policies. Differently, the third approach performs a single gradient-ascent run that, at each step, generates an improved continuous approximation of the Pareto frontier. The parameters of a function that defines a manifold in the policy parameter space are updated following the gradient of some performance criterion so that the sequence of candidate solutions gets as close as possible to the Pareto front. Besides reviewing the three different approaches and discussing their main properties, we empirically compare them with other MORL algorithms on two interesting MOMDPs.
  • Keywords
    Pareto optimisation; approximation theory; decision making; evolutionary computation; gradient methods; learning (artificial intelligence); MOMDPs; MORL approaches; Pareto following; Pareto frontier approximation; gradient-ascent algorithms; gradient-based policy-search procedures; multiobjective Markov decision processes; multiobjective evolutionary algorithms; multiobjective optimization problems; multiobjective reinforcement-learning approaches; multiobjective sequential decision making; nondominated policies; performance criterion; policy gradient approaches; policy-gradient algorithms; radial following; Algorithm design and analysis; Approximation algorithms; Approximation methods; Manifolds; Measurement; Optimization; Water resources;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), 2014 IEEE Symposium on
  • Conference_Location
    Orlando, FL
  • Type

    conf

  • DOI
    10.1109/ADPRL.2014.7010618
  • Filename
    7010618