DocumentCode
2060865
Title
PAC learning for Markov decision processes and dynamic games
Author
Jain, Rahul ; Varaiya, Pravin P.
Author_Institution
EECS Dept., California Univ., Berkeley, CA, USA
fYear
2004
fDate
27 June-2 July 2004
Firstpage
468
Abstract
We extend the probably approximately correct (PAC) model of learning to Markov decision processes (MDPs) and dynamic games. We obtain simulation-based uniform sample complexity bounds for value function estimates of discounted reward MDPs. We also obtain uniform sample complexity results for Markov games with a finite number of players.
Keywords
Markov processes; decision theory; game theory; Markov decision process; Markov game; dynamic game; function estimation; probably approximately correct learning; sample complexity bound; Contracts; Convergence; Markov processes; Noise generators; Space stations; State-space methods; Stochastic processes;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Theory, 2004. ISIT 2004. Proceedings. International Symposium on
Print_ISBN
0-7803-8280-3
Type
conf
DOI
10.1109/ISIT.2004.1365505
Filename
1365505
Link To Document