Title :
Adaptive zero-sum stochastic game for two finite Markov chains
Author :
Poznyak, A.S. ; Najim, K.
Author_Institution :
Control Autom., CINVESTAV-IPN, Mexico City, Mexico
Abstract :
A two finite Markov chains repeated zero-sum stochastic game with unknown transition matrices and payoffs is considered. The control objective is to obtain the equilibrium point based only on current measurements. The behavior of each players is modelled by a finite controlled Markov chain. A novel adaptive policy is developed based on Lagrange multipliers involved in a “learning through reinforcement” procedure. A regularized Lagrange function and a new normalization procedure are introduced. The saddle-point of this function is shown to be unique. The convergence properties are proved and the order of almost sure convergence is estimated as (n-1/3 )
Keywords :
Lyapunov methods; Markov processes; convergence; matrix algebra; probability; stochastic games; adaptive policy; adaptive zero-sum stochastic game; control objective; convergence properties; equilibrium point; finite controlled Markov chain; normalization procedure; regularized Lagrange function; reinforcement learning; repeated game; saddle-point; Adaptive control; Automatic control; Convergence; Current measurement; Laboratories; Lagrangian functions; Process control; Programmable control; Recursive estimation; Stochastic processes;
Conference_Titel :
Decision and Control, 2000. Proceedings of the 39th IEEE Conference on
Conference_Location :
Sydney, NSW
Print_ISBN :
0-7803-6638-7
DOI :
10.1109/CDC.2000.912852