مرکز منطقه ای اطلاع رساني علوم و فناوري - Simulation-based uniform value function estimates of discounted and average-reward MDPs

DocumentCode :

435023

Title :

Simulation-based uniform value function estimates of discounted and average-reward MDPs

Author :

Jain, Rahul ; Varaiya, Pravin

Author_Institution :

Dept. of Electr. Eng. & Comput. Sci., California Univ., Berkeley, CA, USA

Volume :

fYear :

2004

fDate :

14-17 Dec. 2004

Firstpage :

4405

Abstract :

The value function of a Markov decision problem assigns to each policy its expected discounted reward. This expected reward can be estimated as the empirical average of the reward over many independent simulation runs. We derive bounds on the number of runs needed for the convergence of the empirical average to the expected reward uniformly for a class of policies, in terms of the V-C or pseudo dimension of the policy class. Uniform convergence results are also obtained for the average reward case. They can be extended to partially observed MDPs and Markov games. The results can be viewed as an extension of the probably approximately correct (PAC) learning theory for partially observable MDPs (POMDPs) and Markov games.

Keywords :

Markov processes; decision theory; game theory; Markov games; V-C dimension; average-reward Markov decision problem; discounted reward Markov decision problem; partially observable Markov decision problem; probably approximately correct learning theory; pseudo dimension; simulation-based uniform value function estimates; uniform convergence; Computational modeling; Convergence; Dynamic programming; Equations; Game theory; Optimal control; Space stations; State estimation; State-space methods; Stochastic processes;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Decision and Control, 2004. CDC. 43rd IEEE Conference on

ISSN :

0191-2216

Print_ISBN :

0-7803-8682-5

Type :

conf

DOI :

10.1109/CDC.2004.1429444

Filename :

1429444

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=435023