مرکز منطقه ای اطلاع رساني علوم و فناوري - Simulation-based optimization of Markov decision processes: An empirical process theory approach

Title of article :

Simulation-based optimization of Markov decision processes: An empirical process theory approach

Author/Authors :

Jain، نويسنده , , Rahul and Varaiya، نويسنده , , Pravin، نويسنده ,

Issue Information :

روزنامه با شماره پیاپی سال 2010

Pages :

From page :

1297

To page :

1304

Abstract :

We generalize and build on the PAC Learning framework for Markov Decision Processes developed in Jain and Varaiya (2006). We consider the reward function to depend on both the state and the action. Both the state and action spaces can potentially be countably infinite. We obtain an estimate for the value function of a Markov decision process, which assigns to each policy its expected discounted reward. This expected reward can be estimated as the empirical average of the reward over many independent simulation runs. We derive bounds on the number of runs needed for the convergence of the empirical average to the expected reward uniformly for a class of policies, in terms of the V-C or pseudo dimension of the policy class. We then propose a framework to obtain an ϵ -optimal policy from simulation. We provide sample complexity of such an approach.

Keywords :

Monte Carlo simulation , stochastic control , optimization , Markov decision processes , learning algorithms

Journal title :

Automatica

Serial Year :

2010

Journal title :

Automatica

Record number :

1448074

Link To Document :

https://search.isc.ac/dl/search/defaultta.aspx?DTC=10&DC=1448074