Title :
Policy gradient approaches for multi-objective sequential decision making: A comparison
Author :
Parisi, Simone ; Pirotta, Matteo ; Smacchia, Nicola ; Bascetta, Luca ; Restelli, Marcello
Author_Institution :
Dept. of Electron., Inf. & Bioeng., Politec. di Milano, Milan, Italy
Abstract :
This paper investigates the use of policy gradient techniques to approximate the Pareto frontier in Multi-Objective Markov Decision Processes (MOMDPs). Despite the popularity of policy-gradient algorithms and the fact that gradient-ascent algorithms have been already proposed to numerically solve multi-objective optimization problems, especially in combination with multi-objective evolutionary algorithms, so far little attention has been paid to the use of gradient information to face multi-objective sequential decision problems. Three different Multi-Objective Reinforcement-Learning (MORL) approaches are here presented. The first two, called radial and Pareto following, start from an initial policy and perform gradient-based policy-search procedures aimed at finding a set of non-dominated policies. Differently, the third approach performs a single gradient-ascent run that, at each step, generates an improved continuous approximation of the Pareto frontier. The parameters of a function that defines a manifold in the policy parameter space are updated following the gradient of some performance criterion so that the sequence of candidate solutions gets as close as possible to the Pareto front. Besides reviewing the three different approaches and discussing their main properties, we empirically compare them with other MORL algorithms on two interesting MOMDPs.
Keywords :
Pareto optimisation; approximation theory; decision making; evolutionary computation; gradient methods; learning (artificial intelligence); MOMDPs; MORL approaches; Pareto following; Pareto frontier approximation; gradient-ascent algorithms; gradient-based policy-search procedures; multiobjective Markov decision processes; multiobjective evolutionary algorithms; multiobjective optimization problems; multiobjective reinforcement-learning approaches; multiobjective sequential decision making; nondominated policies; performance criterion; policy gradient approaches; policy-gradient algorithms; radial following; Algorithm design and analysis; Approximation algorithms; Approximation methods; Manifolds; Measurement; Optimization; Water resources;
Conference_Titel :
Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), 2014 IEEE Symposium on
Conference_Location :
Orlando, FL
DOI :
10.1109/ADPRL.2014.7010618