DocumentCode :
1743644
Title :
Approximate value iteration with randomized policies
Author :
De Farias, D.P. ; Van Roy, B.
Author_Institution :
Stanford Univ., CA, USA
Volume :
4
fYear :
2000
fDate :
2000
Firstpage :
3421
Abstract :
The curse of dimensionality in dynamic programming prevents, in most problems of practical interest, the exact computation of the value function. We study the fixed points of approximate value iteration, a simple algorithm that combats the curse of dimensionality by generating approximate iterates of the classical value iteration algorithm in the span of a set of prespecified basis functions. We show that, in general, the modified dynamic programming operator need not possess a fixed point, and therefore, approximate value iteration should not be expected to converge. However, by using a class of randomized policies, approximate value iteration is guaranteed to possess at least one fixed point. We finally discuss the link between approximate value iteration and temporal-difference learning (TD), and show that the existence of fixed points for approximate value iteration implies existence of stationary points for the ordinary differential equation approximated by a version of TD that incorporates “exploration”
Keywords :
Markov processes; dynamic programming; function approximation; iterative methods; learning (artificial intelligence); approximate value iteration; modified dynamic programming operator; randomized policies; temporal-difference learning; Control systems; Differential equations; Dynamic programming; State-space methods; Stochastic processes;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Decision and Control, 2000. Proceedings of the 39th IEEE Conference on
Conference_Location :
Sydney, NSW
ISSN :
0191-2216
Print_ISBN :
0-7803-6638-7
Type :
conf
DOI :
10.1109/CDC.2000.912232
Filename :
912232
Link To Document :
بازگشت