Finite-Approximation-Error-Based Discrete-Time Iterative Adaptive Dynamic Programming

Author

Qinglai Wei ; Fei-Yue Wang ; Derong Liu ; Xiong Yang

Author_Institution

State Key Lab. of Manage. & Control for Complex Syst., Inst. of Autom., Beijing, China

Volume

44

Issue

12

fYear

2014

fDate

Dec. 2014

Firstpage

2820

Lastpage

2833

Abstract

In this paper, a new iterative adaptive dynamic programming (ADP) algorithm is developed to solve optimal control problems for infinite horizon discrete-time nonlinear systems with finite approximation errors. First, a new generalized value iteration algorithm of ADP is developed to make the iterative performance index function converge to the solution of the Hamilton-Jacobi-Bellman equation. The generalized value iteration algorithm permits an arbitrary positive semi-definite function to initialize it, which overcomes the disadvantage of traditional value iteration algorithms. When the iterative control law and iterative performance index function in each iteration cannot accurately be obtained, for the first time a new “design method of the convergence criteria” for the finite-approximation-error-based generalized value iteration algorithm is established. A suitable approximation error can be designed adaptively to make the iterative performance index function converge to a finite neighborhood of the optimal performance index function. Neural networks are used to implement the iterative ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the developed method.

Keywords

adaptive control; approximation theory; discrete time systems; dynamic programming; infinite horizon; iterative methods; neurocontrollers; nonlinear control systems; optimal control; Hamilton-Jacobi-Bellman equation; finite approximation error; finite neighborhood; finite-approximation-error-based discrete-time iterative adaptive dynamic programming; finite-approximation-error-based generalized value iteration algorithm; infinite horizon discrete-time nonlinear systems; iterative ADP algorithm; iterative adaptive dynamic programming algorithm; iterative control law; iterative performance index function; neural networks; optimal control problem; optimal performance index function; positive semidefinite function; traditional value iteration algorithm; Adaptive critic designs; adaptive dynamic programming (ADP); approximate dynamic programming; approximation error; neural networks; neuro-dynamic programming; nonlinear systems; optimal control; reinforcement learning; value iteration;

fLanguage

English

Journal_Title

Cybernetics, IEEE Transactions on

Publisher

ieee

ISSN

2168-2267

Type

jour

DOI

10.1109/TCYB.2014.2354377

Filename

6912005