DocumentCode :
2814828
Title :
Adversarial multi-armed bandit approach to two-person zero-sum Markov games
Author :
Chang, Hyeong Soo ; Fu, Michael C. ; Marcus, Steven I.
Author_Institution :
Sogang Univ., Seoul
fYear :
2007
fDate :
12-14 Dec. 2007
Firstpage :
127
Lastpage :
132
Abstract :
A sampling-based algorithm for solving stochastic optimization problems, based on Auer et. al.\´s Exp3 algorithm for "adversarial multi-armed bandit problems," has been recently presented by the authors. In particular, the authors recursively extended the Exp3-based algorithm for solving finite-horizon Markov decision processes (MDPs) and analyzed its finite-iteration performance in terms of the expected bias relative to the maximum value of the "recursive sample- average-approximation (SAA)" problem induced by the sampling process in the algorithm, showing that the upper bound of the expected bias approaches zero as the sampling size per state sampled in each stage goes to infinity, leading to the convergence to the optimal value of the original MDP problem in the limit. As a sequel to the previous work, the idea is further extended for solving two-person zero-sum Markov games (MGs), providing a finite-iteration bound to the equilibrium value of the induced "recursive SAA game" problem and asymptotic convergence to the true equilibrium value. The recursively extended algorithm for MGs can be used for breaking the curse of dimensionality.
Keywords :
Markov processes; convergence; game theory; optimisation; sampling methods; Exp3 algorithm; adversarial multiarmed bandit problem; asymptotic convergence; finite iteration performance; finite-horizon Markov decision process; recursive sample-average-approximation problem; sampling-based algorithm; state sampling; stochastic optimization problems; two-person zero-sum Markov games; Approximation algorithms; Convergence; Game theory; H infinity control; Performance analysis; Probability distribution; Sampling methods; State-space methods; Stochastic processes; Upper bound; Markov decision process; Markov game; sample average approximation; sampling;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Decision and Control, 2007 46th IEEE Conference on
Conference_Location :
New Orleans, LA
ISSN :
0191-2216
Print_ISBN :
978-1-4244-1497-0
Electronic_ISBN :
0191-2216
Type :
conf
DOI :
10.1109/CDC.2007.4434044
Filename :
4434044
Link To Document :
بازگشت