مرکز منطقه ای اطلاع رساني علوم و فناوري - Empirical policy iteration for approximate dynamic programming

DocumentCode :

116277

Title :

Empirical policy iteration for approximate dynamic programming

Author :

Haskell, William B. ; Jain, Rahul ; Kalathil, Dileep

Author_Institution :

Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA, USA

fYear :

2014

fDate :

15-17 Dec. 2014

Firstpage :

6573

Lastpage :

6578

Abstract :

We propose a simulation based algorithm, Empirical Policy Iteration (EPI) algorithm, for finding the optimal policy function of an MDP with infinite horizon discounted cost criteria when the transition kernels are unknown. Unlike simulation based algorithms using stochastic approximation techniques which give only asymptotic convergence results, we give provable, non-asymptotic performance guarantees in terms of sample complexity results: given ε > 0 and δ > 0 we specify the minimum number of simulation samples n(ε, δ) needed in each iteration and the minimum number of iterations k(ε, δ) that are sufficient for the EPI to yield, with a probability at least 1-δ, an approximate value function that is at least ε close to the optimal value function.

Keywords :

Markov processes; decision making; dynamic programming; EPI algorithm; MDP; Markov decision processes; approximate dynamic programming; empirical policy iteration; infinite horizon discounted cost criteria; nonasymptotic performance guarantees; optimal policy function; optimal value function; simulation based algorithm; transition kernels; Approximation algorithms; Approximation methods; Convergence; Dynamic programming; Heuristic algorithms; Markov processes;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Decision and Control (CDC), 2014 IEEE 53rd Annual Conference on

Conference_Location :

Los Angeles, CA

Print_ISBN :

978-1-4799-7746-8

Type :

conf

DOI :

10.1109/CDC.2014.7040420

Filename :

7040420

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=116277