DocumentCode :
251136
Title :
Combining learned controllers to achieve new goals based on linearly solvable MDPs
Author :
Uchibe, Eiji ; Doya, Kenji
Author_Institution :
Okinawa Inst. of Sci. & Technol. Grad. Univ., Okinawa, Japan
fYear :
2014
fDate :
May 31 2014-June 7 2014
Firstpage :
5252
Lastpage :
5259
Abstract :
Learning complicated behaviors usually involves intensive manual tuning and expensive computational optimization because we have to solve a nonlinear Hamilton-Jacobi-Bellman (HJB) equation. Recently, Todorov proposed a class of the so-called Linearly solvable Markov Decision Process (LMDP) which converts a nonlinear HJB equation to a linear differential equation. Linearity of the simplified HJB equation allows us to apply superposition to derive a new composite controller from a set of learned primitive controllers. However, his method was a model-based approach and it was not evaluated in a real domain. This study proposes a model-free method which is similar to the Least Squares Temporal Difference (LSTD) learning. In this method, the exponentially transformed cost function can be regarded as the discount factor in LSTD. Our proposed method is applied to learning walking behaviors with the quadruped robot to evaluate in real robot experiments. The goal of each primitive task is to go to the specific target position in the environment and that of the composite task is to approach arbitrary region represented by the primitives´ target positions. Experimental results show that the composite policy can be used as a good initial policy for the new task.
Keywords :
Markov processes; learning (artificial intelligence); least squares approximations; mobile robots; LSTD learning; Markov decision process; composite policy; discount factor; exponentially transformed cost function; learned controllers; least squares temporal difference; linearly solvable MDP; model-free method; model-free reinforcement learning; quadruped robot; robot walking behaviors learning; Computational modeling; Cost function; Equations; Learning (artificial intelligence); Mathematical model; Robots; Springs;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Robotics and Automation (ICRA), 2014 IEEE International Conference on
Conference_Location :
Hong Kong
Type :
conf
DOI :
10.1109/ICRA.2014.6907631
Filename :
6907631
Link To Document :
بازگشت