Title :
Online solution of nonlinear two-player zero-sum games using synchronous policy iteration
Author :
Vamvoudakis, Kyriakos G. ; Lewis, F.L.
Author_Institution :
Autom. & Robot. Res. Inst., Univ. of Texas at Arlington, Fort Worth, TX, USA
Abstract :
In this paper we present an online gaming algorithm based on policy iteration to solve the continuous-time (CT) two-player zero-sum game with infinite horizon cost for nonlinear systems with known dynamics. That is, the algorithm learns online in real-time the solution to the game design HJI equation. This method finds in real-time suitable approximations of the optimal value, and the saddle point control policy and disturbance policy, while also guaranteeing closed-loop stability. The adaptive algorithm is implemented as an actor/critic structure which involves simultaneous continuous-time adaptation of critic, control actor, and disturbance neural networks. We call this online gaming algorithm `synchronous´ zero-sum game policy iteration. A persistence of excitation condition is shown to guarantee convergence of the critic to the actual optimal value function. Novel tuning algorithms are given for critic, actor and disturbance networks. The convergence to the optimal saddle point solution is proven, and stability of the system is also guaranteed. Simulation examples show the effectiveness of the new algorithm.
Keywords :
adaptive control; closed loop systems; continuous time systems; control system synthesis; game theory; infinite horizon; neurocontrollers; nonlinear control systems; optimal control; stability; adaptive algorithm; closed-loop stability; continuous-time adaptation; continuous-time two-player zero-sum game; control actor; disturbance network; disturbance neural network; disturbance policy; excitation condition; game design HJI equation; infinite horizon cost; nonlinear system; nonlinear two-player zero-sum game; online gaming algorithm; optimal saddle point solution; saddle point control policy; synchronous policy iteration; synchronous zero-sum game policy iteration; tuning algorithm; Approximation algorithms; Artificial neural networks; Convergence; Equations; Function approximation; Games; Approximate Dynamic Programming; H-infinity; Hamilton-Jacobi-Isaacs equation; Nash-equilibrium; Persistence of Excitation; Policy Iteration; Synchronous Zero-Sum Game Policy Iteration;
Conference_Titel :
Decision and Control (CDC), 2010 49th IEEE Conference on
Conference_Location :
Atlanta, GA
Print_ISBN :
978-1-4244-7745-6
DOI :
10.1109/CDC.2010.5717607