• DocumentCode
    173336
  • Title

    Multi-objective reinforcement learning for acquiring all Pareto optimal policies simultaneously - Method of determining scalarization weights

  • Author

    Iima, Hitoshi ; Kuroe, Yasuaki

  • Author_Institution
    Dept. of Inf. Sci., Kyoto Inst. of Technol., Kyoto, Japan
  • fYear
    2014
  • fDate
    5-8 Oct. 2014
  • Firstpage
    876
  • Lastpage
    881
  • Abstract
    We recently proposed a multi-objective reinforcement learning method for acquiring all Pareto optimal policies simultaneously by introducing the concept of convex hulls into Q-learning method. In this method, state-action value vectors are obtained through learning only once, and then each Pareto optimal policy is derived through scalarizing the obtained state-action value vectors by using a weight vector. The method does not require learning more than once, and finds all the Pareto optimal policies by determining weight vectors adequately and by giving them in scalarizing the obtained state-action value vectors. This paper proposes a method of determining the scalarization weight vectors. The performance of the proposed method is evaluated through numerical experiments.
  • Keywords
    Pareto optimisation; convex programming; learning (artificial intelligence); mathematics computing; vectors; Pareto optimal policies; Q-learning method; convex hulls; multiobjective reinforcement learning; scalarization weight vectors; state-action value vectors; Equations; Learning (artificial intelligence); Learning systems; Markov processes; Mathematical model; Pareto optimization; Vectors; Pareto optimal policy; multi-objective problem; reinforcement learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems, Man and Cybernetics (SMC), 2014 IEEE International Conference on
  • Conference_Location
    San Diego, CA
  • Type

    conf

  • DOI
    10.1109/SMC.2014.6974022
  • Filename
    6974022