Title :
Stochastic policy gradient reinforcement learning on a simple 3D biped
Author :
Tedrake, Russ ; Zhang, Teresa Weirui ; Seung, H. Sebastian
Author_Institution :
Center for Bits & Atoms, Massachusetts Inst. of Technol., Cambridge, MA, USA
fDate :
28 Sept.-2 Oct. 2004
Abstract :
We present a learning system which is able to quickly and reliably acquire a robust feedback control policy for 3D dynamic walking from a blank-slate using only trials implemented on our physical robot. The robot begins walking within a minute and learning converges in approximately 20 minutes. This success can be attributed to the mechanics of our robot, which are modeled after a passive dynamic walker, and to a dramatic reduction in the dimensionality of the learning problem. We reduce the dimensionality by designing a robot with only 6 internal degrees of freedom and 4 actuators, by decomposing the control system in the frontal and sagittal planes, and by formulating the learning problem on the discrete return map dynamics. We apply a stochastic policy gradient algorithm to this reduced problem and decrease the variance of the update using a state-based estimate of the expected cost. This optimized learning system works quickly enough that the robot is able to continually adapt to the terrain as it walks.
Keywords :
adaptive systems; feedback; learning (artificial intelligence); learning systems; legged locomotion; optimisation; reduced order systems; robot dynamics; robust control; state estimation; 3D biped robot; 3D dynamic walking; discrete return map dynamics; optimized learning system; robust feedback control; state-based estimate; stochastic policy gradient algorithm; stochastic policy gradient reinforcement learning; Actuators; Control systems; Costs; Feedback control; Learning systems; Legged locomotion; Robots; Robust control; State estimation; Stochastic processes;
Conference_Titel :
Intelligent Robots and Systems, 2004. (IROS 2004). Proceedings. 2004 IEEE/RSJ International Conference on
Print_ISBN :
0-7803-8463-6
DOI :
10.1109/IROS.2004.1389841