DocumentCode
173336
Title
Multi-objective reinforcement learning for acquiring all Pareto optimal policies simultaneously - Method of determining scalarization weights
Author
Iima, Hitoshi ; Kuroe, Yasuaki
Author_Institution
Dept. of Inf. Sci., Kyoto Inst. of Technol., Kyoto, Japan
fYear
2014
fDate
5-8 Oct. 2014
Firstpage
876
Lastpage
881
Abstract
We recently proposed a multi-objective reinforcement learning method for acquiring all Pareto optimal policies simultaneously by introducing the concept of convex hulls into Q-learning method. In this method, state-action value vectors are obtained through learning only once, and then each Pareto optimal policy is derived through scalarizing the obtained state-action value vectors by using a weight vector. The method does not require learning more than once, and finds all the Pareto optimal policies by determining weight vectors adequately and by giving them in scalarizing the obtained state-action value vectors. This paper proposes a method of determining the scalarization weight vectors. The performance of the proposed method is evaluated through numerical experiments.
Keywords
Pareto optimisation; convex programming; learning (artificial intelligence); mathematics computing; vectors; Pareto optimal policies; Q-learning method; convex hulls; multiobjective reinforcement learning; scalarization weight vectors; state-action value vectors; Equations; Learning (artificial intelligence); Learning systems; Markov processes; Mathematical model; Pareto optimization; Vectors; Pareto optimal policy; multi-objective problem; reinforcement learning;
fLanguage
English
Publisher
ieee
Conference_Titel
Systems, Man and Cybernetics (SMC), 2014 IEEE International Conference on
Conference_Location
San Diego, CA
Type
conf
DOI
10.1109/SMC.2014.6974022
Filename
6974022
Link To Document