Title :
Near optimal output feedback control of nonlinear discrete-time systems based on reinforcement neural network learning
Author :
Qiming Zhao ; Hao Xu ; Jagannathan, Sarangapani
Author_Institution :
DENSO Int. America Inc., Southfield, MI, USA
Abstract :
In this paper, the output feedback based finite-horizon near optimal regulation of nonlinear affine discrete-time systems with unknown system dynamics is considered by using neural networks (NNs) to approximate Hamilton-Jacobi-Bellman (HJB) equation solution. First, a NN-based Luenberger observer is proposed to reconstruct both the system states and the control coefficient matrix. Next, reinforcement learning methodology with actor-critic structure is utilized to approximate the time-varying solution, referred to as the value function, of the HJB equation by using a NN. To properly satisfy the terminal constraint, a new error term is defined and incorporated in the NN update law so that the terminal constraint error is also minimized over time. The NN with constant weights and time-dependent activation function is employed to approximate the time-varying value function which is subsequently utilized to generate the finite-horizon near optimal control policy due to NN reconstruction errors. The proposed scheme functions in a forward-in-time manner without offline training phase. Lyapunov analysis is used to investigate the stability of the overall closed-loop system. Simulation results are given to show the effectiveness and feasibility of the proposed method.
Keywords :
Lyapunov methods; closed loop systems; discrete time systems; feedback; learning (artificial intelligence); matrix algebra; neurocontrollers; nonlinear systems; observers; optimal control; partial differential equations; stability; time-varying systems; transfer functions; HJB equation solution; Hamilton-Jacobi-Bellman equation solution; Lyapunov analysis; NN reconstruction errors; NN update law; NN-based Luenberger observer; actor-critic structure; closed-loop system stability; control coefficient matrix; error term; finite-horizon near optimal control policy; near optimal output feedback control; nonlinear affine discrete-time systems; output feedback based finite-horizon near optimal regulation; reinforcement neural network learning; system state reonstruction; terminal constraint; time-dependent activation function; time-varying solution; time-varying value function; unknown system dynamics; value function; Approximation methods; Artificial neural networks; Feedback; Learning (artificial intelligence); Nonlinear dynamical systems; Observers; Optimal control; Finite-horizon; Hamilton-Jacobi-Bellman equation; approximate dynamic programming; neural network; optimal regulation;
Journal_Title :
Automatica Sinica, IEEE/CAA Journal of
DOI :
10.1109/JAS.2014.7004665