Transfer learning for direct policy search: A reward shaping approach

Author

Doncieux, Stephane

Author_Institution

ISIR, Univ. Pierre et Marie Curie - Paris 6, Paris, France

fYear

2013

fDate

18-22 Aug. 2013

Firstpage

1

Lastpage

6

Abstract

In the perspective of life long learning, a robot may face different, but related situations. Being able to exploit the knowledge acquired during a first learning phase may be critical in order to solve more complex tasks. This is the transfer learning problem. This problem is addressed here in the case of direct policy search algorithms. No discrete states, nor actions are defined a priori. A policy is described by a controller that computes orders to be sent to the motors out of sensor values. Both motor and sensor values can be continuous. The proposed approach relies on population based direct policy search algorithms, i.e. evolutionary algorithms. It exploits the numerous behaviors that are generated during the search. When learning on the source task, a knowledge base is built. The knowledge base aims at identifying the most salient behaviors segments with regards to the considered task. Afterwards, the knowledge base is exploited on a target task, with a reward shaping approach: besides its reward on the task, a policy is credited with a reward computed from the knowledge base. The rationale behind this approach is to automatically detect the stepping stones, i.e. the behavior segments that have lead to a reward in the source task before the policy is efficient enough to get the reward on the target task. The approach is tested in simulation with a neuroevolution approach and on ball collecting tasks.

Keywords

continuing professional development; evolutionary computation; intelligent robots; knowledge acquisition; knowledge based systems; search problems; sensors; ball collecting tasks; evolutionary algorithm; knowledge acquisition; knowledge base; learning phase; life long learning; motor values; neuroevolution approach; population based direct policy search algorithm; reward shaping approach; sensor values; source task; transfer learning problem; Conferences; Evolutionary computation; Knowledge based systems; Robot sensing systems; Switches; Trajectory;

fLanguage

English

Publisher

ieee

Conference_Titel

Development and Learning and Epigenetic Robotics (ICDL), 2013 IEEE Third Joint International Conference on

Conference_Location

Osaka

Type

conf

DOI

10.1109/DevLrn.2013.6652568

Filename

6652568