DocumentCode :
3400257
Title :
Learning continuous-action control policies
Author :
Pazis, Jason ; Lagoudakis, Michail G.
Author_Institution :
Dept. of Electron. & Comput. Eng., Tech. Univ. of Crete, Chania
fYear :
2009
fDate :
March 30 2009-April 2 2009
Firstpage :
169
Lastpage :
176
Abstract :
Reinforcement learning for control in stochastic processes has received significant attention in the last few years. Several data-efficient methods, even for continuous state spaces, have been proposed, however most of them assume a small and discrete action space. While continuous action spaces are quite common in real-world problems, the most common approach still employed in practice is coarse discretization of the action space. This paper presents a novel, computationally-efficient method, called adaptive action modification, for realizing continuous-action policies, using binary decisions corresponding to adaptive increment or decrement changes in the values of the continuous action variables. The proposed approach essentially approximates any continuous action space to arbitrary resolution and can be combined with any discrete-action reinforcement learning algorithm for learning continuous-action policies. Our approach is coupled with three well-known reinforcement learning algorithms (Q-learning, fitted Q-iteration, and least-squares policy iteration) and its use and properties are thoroughly investigated and demonstrated on the continuous state-action inverted pendulum and bicycle balancing and riding domains.
Keywords :
continuous systems; discrete systems; iterative methods; learning (artificial intelligence); least squares approximations; stochastic systems; Q-learning; adaptive action modification; bicycle balancing; bicycle riding; coarse discretization; computationally-efficient method; continuous action variables; continuous state spaces; continuous state-action inverted pendulum; continuous-action control policies; data-efficient methods; discrete action space; discrete-action reinforcement learning algorithm; fitted Q-iteration; least-squares policy iteration; stochastic processes; Bicycles; Current supplies; Learning; Muscles; Process control; State-space methods; Stochastic processes; Torque; Vehicles; Vents;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Adaptive Dynamic Programming and Reinforcement Learning, 2009. ADPRL '09. IEEE Symposium on
Conference_Location :
Nashville, TN
Print_ISBN :
978-1-4244-2761-1
Type :
conf
DOI :
10.1109/ADPRL.2009.4927541
Filename :
4927541
Link To Document :
بازگشت