مرکز منطقه ای اطلاع رساني علوم و فناوري - Qualitative Adaptive Reward Learning With Success Failure Maps: Applied to Humanoid Robot Walking

DocumentCode :

80686

Title :

Qualitative Adaptive Reward Learning With Success Failure Maps: Applied to Humanoid Robot Walking

Author :

Nassour, John ; Hugel, Vincent ; Ouezdou, F.B. ; Cheng, Gordon

Author_Institution :

Inst. for Cognitive Syst., Tech. Univ. of Munich, Munich, Germany

Volume :

Issue :

fYear :

2013

fDate :

Jan. 2013

Firstpage :

Lastpage :

Abstract :

In the human brain, rewards are encoded in a flexible and adaptive way after each novel stimulus. Neurons of the orbitofrontal cortex are the key reward structure of the brain. Neurobiological studies show that the anterior cingulate cortex of the brain is primarily responsible for avoiding repeated mistakes. According to vigilance threshold, which denotes the tolerance to risks, we can differentiate between a learning mechanism that takes risks and one that averts risks. The tolerance to risk plays an important role in such a learning mechanism. Results have shown the differences in learning capacity between risk-taking and risk-avert behaviors. These neurological properties provide promising inspirations for robot learning based on rewards. In this paper, we propose a learning mechanism that is able to learn from negative and positive feedback with reward coding adaptively. It is composed of two phases: evaluation and decision making. In the evaluation phase, we use a Kohonen self-organizing map technique to represent success and failure. Decision making is based on an early warning mechanism that enables avoiding repeating past mistakes. The behavior to risk is modulated in order to gain experiences for success and for failure. Success map is learned with adaptive reward that qualifies the learned task in order to optimize the efficiency. Our approach is presented with an implementation on the NAO humanoid robot, controlled by a bioinspired neural controller based on a central pattern generator. The learning system adapts the oscillation frequency and the motor neuron gain in pitch and roll in order to walk on flat and sloped terrain, and to switch between them.

Keywords :

decision making; feedback; gait analysis; humanoid robots; intelligent robots; mobile robots; neurocontrollers; risk management; self-organising feature maps; Kohonen self-organizing map technique; NAO humanoid robot; anterior cingulate cortex; bioinspired neural controller; central pattern generator; decision making; early warning mechanism; efficiency optimization; failure representation; flat terrain; human brain; humanoid robot walking; motor neuron gain; negative feedback; neurological property; orbitofrontal cortex neurons; oscillation frequency; positive feedback; qualitative adaptive reward learning; reward encoding; risk tolerance; risk-avert behavior; risk-taking behavior; robot learning capacity; sloped terrain; success failure maps; success representation; vigilance threshold; Humanoid robots; Humans; Learning systems; Legged locomotion; Neurons; Vectors; Experience-based learning mechanism; humanoid learning; humanoid robot walking; neurorobotics;

fLanguage :

English

Journal_Title :

Neural Networks and Learning Systems, IEEE Transactions on

Publisher :

ieee

ISSN :

2162-237X

Type :

jour

DOI :

10.1109/TNNLS.2012.2224370

Filename :

6365318

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=80686