مرکز منطقه ای اطلاع رساني علوم و فناوري - Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof

DocumentCode :

776463

Title :

Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof

Author :

Al-Tamimi, Asma ; Lewis, Frank L. ; Abu-Khalaf, Murad

Author_Institution :

Hashemite Univ., Zarqa

Volume :

Issue :

fYear :

2008

Firstpage :

943

Lastpage :

949

Abstract :

Convergence of the value-iteration-based heuristic dynamic programming (HDP) algorithm is proven in the case of general nonlinear systems. That is, it is shown that HDP converges to the optimal control and the optimal value function that solves the Hamilton-Jacobi-Bellman equation appearing in infinite-horizon discrete-time (DT) nonlinear optimal control. It is assumed that, at each iteration, the value and action update equations can be exactly solved. The following two standard neural networks (NN) are used: a critic NN is used to approximate the value function, whereas an action network is used to approximate the optimal control policy. It is stressed that this approach allows the implementation of HDP without knowing the internal dynamics of the system. The exact solution assumption holds for some classes of nonlinear systems and, specifically, in the specific case of the DT linear quadratic regulator (LQR), where the action is linear and the value quadratic in the states and NNs have zero approximation error. It is stressed that, for the LQR, HDP may be implemented without knowing the system A matrix by using two NNs. This fact is not generally appreciated in the folklore of HDP for the DT LQR, where only one critic NN is generally used.

Keywords :

approximation theory; convergence of numerical methods; discrete time systems; dynamic programming; infinite horizon; iterative methods; linear quadratic control; neurocontrollers; nonlinear control systems; approximate dynamic programming; discrete-time linear quadratic regulator; discrete-time nonlinear Hamilton-Jacobi-Bellman equation; infinite-horizon discrete-time nonlinear optimal control; neural network; nonlinear system; optimal value function; value-iteration-based heuristic dynamic programming convergence; Adaptive critics; Hamilton Jacobi Bellman (HJB); approximate dynamic programming (ADP); policy iteration; value iteration; Algorithms; Computer Simulation; Feedback; Models, Theoretical; Programming, Linear; Systems Theory;

fLanguage :

English

Journal_Title :

Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on

Publisher :

ieee

ISSN :

1083-4419

Type :

jour

DOI :

10.1109/TSMCB.2008.926614

Filename :

4554208

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=776463