DocumentCode :
116277
Title :
Empirical policy iteration for approximate dynamic programming
Author :
Haskell, William B. ; Jain, Rahul ; Kalathil, Dileep
Author_Institution :
Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA, USA
fYear :
2014
fDate :
15-17 Dec. 2014
Firstpage :
6573
Lastpage :
6578
Abstract :
We propose a simulation based algorithm, Empirical Policy Iteration (EPI) algorithm, for finding the optimal policy function of an MDP with infinite horizon discounted cost criteria when the transition kernels are unknown. Unlike simulation based algorithms using stochastic approximation techniques which give only asymptotic convergence results, we give provable, non-asymptotic performance guarantees in terms of sample complexity results: given ε > 0 and δ > 0 we specify the minimum number of simulation samples n(ε, δ) needed in each iteration and the minimum number of iterations k(ε, δ) that are sufficient for the EPI to yield, with a probability at least 1-δ, an approximate value function that is at least ε close to the optimal value function.
Keywords :
Markov processes; decision making; dynamic programming; EPI algorithm; MDP; Markov decision processes; approximate dynamic programming; empirical policy iteration; infinite horizon discounted cost criteria; nonasymptotic performance guarantees; optimal policy function; optimal value function; simulation based algorithm; transition kernels; Approximation algorithms; Approximation methods; Convergence; Dynamic programming; Heuristic algorithms; Markov processes;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Decision and Control (CDC), 2014 IEEE 53rd Annual Conference on
Conference_Location :
Los Angeles, CA
Print_ISBN :
978-1-4799-7746-8
Type :
conf
DOI :
10.1109/CDC.2014.7040420
Filename :
7040420
Link To Document :
بازگشت