DocumentCode :
435023
Title :
Simulation-based uniform value function estimates of discounted and average-reward MDPs
Author :
Jain, Rahul ; Varaiya, Pravin
Author_Institution :
Dept. of Electr. Eng. & Comput. Sci., California Univ., Berkeley, CA, USA
Volume :
4
fYear :
2004
fDate :
14-17 Dec. 2004
Firstpage :
4405
Abstract :
The value function of a Markov decision problem assigns to each policy its expected discounted reward. This expected reward can be estimated as the empirical average of the reward over many independent simulation runs. We derive bounds on the number of runs needed for the convergence of the empirical average to the expected reward uniformly for a class of policies, in terms of the V-C or pseudo dimension of the policy class. Uniform convergence results are also obtained for the average reward case. They can be extended to partially observed MDPs and Markov games. The results can be viewed as an extension of the probably approximately correct (PAC) learning theory for partially observable MDPs (POMDPs) and Markov games.
Keywords :
Markov processes; decision theory; game theory; Markov games; V-C dimension; average-reward Markov decision problem; discounted reward Markov decision problem; partially observable Markov decision problem; probably approximately correct learning theory; pseudo dimension; simulation-based uniform value function estimates; uniform convergence; Computational modeling; Convergence; Dynamic programming; Equations; Game theory; Optimal control; Space stations; State estimation; State-space methods; Stochastic processes;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Decision and Control, 2004. CDC. 43rd IEEE Conference on
ISSN :
0191-2216
Print_ISBN :
0-7803-8682-5
Type :
conf
DOI :
10.1109/CDC.2004.1429444
Filename :
1429444
Link To Document :
بازگشت