• DocumentCode
    2029005
  • Title

    Transfer learning for direct policy search: A reward shaping approach

  • Author

    Doncieux, Stephane

  • Author_Institution
    ISIR, Univ. Pierre et Marie Curie - Paris 6, Paris, France
  • fYear
    2013
  • fDate
    18-22 Aug. 2013
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    In the perspective of life long learning, a robot may face different, but related situations. Being able to exploit the knowledge acquired during a first learning phase may be critical in order to solve more complex tasks. This is the transfer learning problem. This problem is addressed here in the case of direct policy search algorithms. No discrete states, nor actions are defined a priori. A policy is described by a controller that computes orders to be sent to the motors out of sensor values. Both motor and sensor values can be continuous. The proposed approach relies on population based direct policy search algorithms, i.e. evolutionary algorithms. It exploits the numerous behaviors that are generated during the search. When learning on the source task, a knowledge base is built. The knowledge base aims at identifying the most salient behaviors segments with regards to the considered task. Afterwards, the knowledge base is exploited on a target task, with a reward shaping approach: besides its reward on the task, a policy is credited with a reward computed from the knowledge base. The rationale behind this approach is to automatically detect the stepping stones, i.e. the behavior segments that have lead to a reward in the source task before the policy is efficient enough to get the reward on the target task. The approach is tested in simulation with a neuroevolution approach and on ball collecting tasks.
  • Keywords
    continuing professional development; evolutionary computation; intelligent robots; knowledge acquisition; knowledge based systems; search problems; sensors; ball collecting tasks; evolutionary algorithm; knowledge acquisition; knowledge base; learning phase; life long learning; motor values; neuroevolution approach; population based direct policy search algorithm; reward shaping approach; sensor values; source task; transfer learning problem; Conferences; Evolutionary computation; Knowledge based systems; Robot sensing systems; Switches; Trajectory;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Development and Learning and Epigenetic Robotics (ICDL), 2013 IEEE Third Joint International Conference on
  • Conference_Location
    Osaka
  • Type

    conf

  • DOI
    10.1109/DevLrn.2013.6652568
  • Filename
    6652568