Title :
Convergent reinforcement learning control with neural networks and continuous action search
Author :
Minwoo Lee ; Anderson, Charles W.
Author_Institution :
Dept. of Comput. Sci., Colorado State Univ., Fort Collins, CO, USA
Abstract :
We combine a convergent TD-learning method and direct continuous action search with neural networks for function approximation to obtain both stability and generalization over inexperienced state-action pairs. We extend linear Greedy-GQ to nonlinear neural networks for convergent learning. Direct continuous action search with back-propagation leads to efficient high-precision control. A high dimensional continuous state and action problem, octopus arm control, is examined to test the proposed algorithm. Comparing TD, linear Greedy-GQ, and nonlinear Greedy-GQ, we discuss how the correction term contributes to learning with nonlinear Greedy-GQ algorithm and how continuous action search contributes to learning speed and stability.
Keywords :
backpropagation; continuous time systems; dexterous manipulators; function approximation; generalisation (artificial intelligence); greedy algorithms; neurocontrollers; search problems; stability; TD algorithm; backpropagation; convergent TD-learning method; convergent reinforcement learning control; correction term; direct continuous action search; function approximation; generalization; high-dimensional continuous state; high-precision control; learning speed; linear Greedy-GQ algorithm; linear Greedy-GQ networks; nonlinear Greedy-GQ algorithm; nonlinear neural networks; octopus arm control; stability; state-action pairs; temporal difference learning; Approximation algorithms; Function approximation; Learning (artificial intelligence); Legged locomotion; Neural networks; Vectors;
Conference_Titel :
Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), 2014 IEEE Symposium on
Conference_Location :
Orlando, FL
DOI :
10.1109/ADPRL.2014.7010612