DocumentCode :
184857
Title :
Empirical Value Iteration for approximate dynamic programming
Author :
Haskell, William B. ; Jain, R. ; Kalathil, Dileep
Author_Institution :
Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA, USA
fYear :
2014
fDate :
4-6 June 2014
Firstpage :
495
Lastpage :
500
Abstract :
We propose a simulation based algorithm, Empirical Value Iteration (EVI) algorithm, for finding the optimal value function of an MDP with infinite horizon discounted cost criteria when the transition probability kernels are unknown. Unlike simulation based algorithms using stochastic approximation techniques which give only asymptotic convergence results, we give provable, non-asymptotic performance guarantees in terms of sample complexity results: given ε > 0 and δ > 0, we specify the minimum number of simulation samples n(ε; δ) needed in each iteration and the minimum number of iterations t(ε; δ) that are sufficient for the EVI to yield, with a probability at least 1 - δ, an approximate value function that is at least ε close to the optimal value function.
Keywords :
Markov processes; approximation theory; convergence of numerical methods; decision theory; dynamic programming; iterative methods; probability; stochastic processes; EVI algorithm; MDP; Markov decision processes; approximate dynamic programming; approximate value function; asymptotic convergence; empirical value iteration algorithm; infinite horizon discounted cost criteria; nonasymptotic performance guarantees; optimal value function; simulation based algorithm; stochastic approximation techniques; transition probability kernels; Algorithm design and analysis; Approximation algorithms; Approximation methods; Convergence; Markov processes; Random variables; Learning; Markov processes; Optimization algorithms;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
American Control Conference (ACC), 2014
Conference_Location :
Portland, OR
ISSN :
0743-1619
Print_ISBN :
978-1-4799-3272-6
Type :
conf
DOI :
10.1109/ACC.2014.6859320
Filename :
6859320
Link To Document :
بازگشت