شماره ركورد كنفرانس :
3862
عنوان مقاله :
Learning a model-free robotic continuous state-action task through contractive Q-network
پديدآورندگان :
Davari Dolatabadi MohammadJavad University of Tehran , Alipour Khalil k.alipour@ut.ac.ir University of Tehran , Hadi Alireza University of Tehran , Tarvirdizadeh Bahram University of Tehran
كليدواژه :
Reinforcement Learning , Neural Network , Biped Robot , Deterministic Policy Gradient , Continuous State Action.
عنوان كنفرانس :
بيست و پنجمين كنفرانس سالانه بين المللي مهندسي مكانيك
چكيده فارسي :
The main purpose of this paper is to make it easier to move toward Autonomous learning robots using reinforcement learning (RL). Deterministic policy gradient algorithm is chosen for work in model-free and continuous state-space configuration. Neural network is chosen as Function Approximator (FA) for actor and critic in the algorithm. A novel method called contractive Q-network is proposed for updating the critic FA (Q-network). Since this method requires fewer samples in learning tasks, it is more efficient to be used in this context. To show the efficiency of the developed method, two illustrative examples are conducted, first in the well-known puddle world and then in Push Recovery (PR) task on a simulated humanoid robot. The robot learns how to recover from a variation of force directions and magnitudes.