DocumentCode :
58712
Title :
Online Synchronous Approximate Optimal Learning Algorithm for Multi-Player Non-Zero-Sum Games With Unknown Dynamics
Author :
Derong Liu ; Hongliang Li ; Ding Wang
Author_Institution :
State Key Lab. of Manage. & Control for Complex Syst., Inst. of Autom., Beijing, China
Volume :
44
Issue :
8
fYear :
2014
fDate :
Aug. 2014
Firstpage :
1015
Lastpage :
1027
Abstract :
In this paper, we develop an online synchronous approximate optimal learning algorithm based on policy iteration to solve a multiplayer nonzero-sum game without the requirement of exact knowledge of dynamical systems. First, we prove that the online policy iteration algorithm for the nonzero-sum game is mathematically equivalent to the quasi-Newton´s iteration in a Banach space. Then, a model neural network is established to identify the unknown continuous-time nonlinear system using input-output data. For each player, a critic neural network and an action neural network are used to approximate its value function and control policy, respectively. Our algorithm only needs to tune the weights of critic neural networks, so there will be less computational complexity during the learning process. All the neural network weights are updated online in real-time, continuously and synchronously. Furthermore, the uniform ultimate bounded stability of the closed-loop system is proved based on Lyapunov approach. Finally, two simulation examples are given to demonstrate the effectiveness of the developed scheme.
Keywords :
Lyapunov methods; Newton method; closed loop systems; computational complexity; continuous time systems; function approximation; game theory; learning (artificial intelligence); mathematics computing; neural nets; nonlinear systems; stability; Banach space; Lyapunov approach; action neural network; closed-loop system; computational complexity; control policy; input-output data; model neural network; multiplayer nonzero-sum games; online policy iteration algorithm; online synchronous approximate optimal learning algorithm; quasiNewton iteration; uniform ultimate bounded stability; unknown continuous-time nonlinear system identification; unknown dynamics; value function approximation; Approximation algorithms; Dynamic programming; Equations; Games; Heuristic algorithms; Mathematical model; Nonlinear systems; Adaptive dynamic programming (ADP); approximate dynamic programming; multiplayer nonzero-sum games; neural networks; neuro-dynamic programming; policy iteration;
fLanguage :
English
Journal_Title :
Systems, Man, and Cybernetics: Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
2168-2216
Type :
jour
DOI :
10.1109/TSMC.2013.2295351
Filename :
6710226
Link To Document :
بازگشت