Title :
Reinforcement Learning Controller Design for Affine Nonlinear Discrete-Time Systems using Online Approximators
Author :
Yang, Qinmin ; Jagannathan, Sarangapani
Author_Institution :
Dept. of Control Sci. & Eng., Zhejiang Univ., Hangzhou, China
fDate :
4/1/2012 12:00:00 AM
Abstract :
In this paper, reinforcement learning state- and output-feedback-based adaptive critic controller designs are proposed by using the online approximators (OLAs) for a general multi-input and multioutput affine unknown nonlinear discretetime systems in the presence of bounded disturbances. The proposed controller design has two entities, an action network that is designed to produce optimal signal and a critic network that evaluates the performance of the action network. The critic estimates the cost-to-go function which is tuned online using recursive equations derived from heuristic dynamic programming. Here, neural networks (NNs) are used both for the action and critic whereas any OLAs, such as radial basis functions, splines, fuzzy logic, etc., can be utilized. For the output-feedback counterpart, an additional NN is designated as the observer to estimate the unavailable system states, and thus, separation principle is not required. The NN weight tuning laws for the controller schemes are also derived while ensuring uniform ultimate boundedness of the closed-loop system using Lyapunov theory. Finally, the effectiveness of the two controllers is tested in simulation on a pendulum balancing system and a two-link robotic arm system.
Keywords :
Lyapunov methods; adaptive control; approximation theory; closed loop systems; control system synthesis; discrete time systems; dynamic programming; learning systems; neurocontrollers; nonlinear control systems; state feedback; Lyapunov theory; NN weight tuning laws; closed loop system; critic network; fuzzy logic; heuristic dynamic programming; multiinput affine unknown nonlinear discrete time systems; multioutput affine unknown nonlinear discrete time systems; neural networks; online approximators; output feedback based adaptive critic controller designs; pendulum balancing system; radial basis functions; recursive equations; reinforcement learning controller design; splines; state feedback based adaptive critic controller designs; two-link robotic arm system; Approximation error; Artificial neural networks; Cost function; Equations; Learning; Mathematical model; Adaptive critic; Lyapunov method; dynamic programming (DP); neural networks (NNs); online approximators (OLAs); online learning; reinforcement learning;
Journal_Title :
Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on
DOI :
10.1109/TSMCB.2011.2166384