DocumentCode :
2772904
Title :
A comparison of learning speed and ability to cope without exploration between DHP and TD(0)
Author :
Fairbank, Michael ; Alonso, Eduardo
Author_Institution :
Dept. of Comput., City Univ. London, London, UK
fYear :
2012
fDate :
10-15 June 2012
Firstpage :
1
Lastpage :
8
Abstract :
This paper demonstrates the principal motivations for Dual Heuristic Dynamic Programming (DHP) learning methods for use in Adaptive Dynamic Programming and Reinforcement Learning, in continuous state spaces: that of automatic local exploration, improved learning speed and the ability to work without stochastic exploration in deterministic environments. In a simple experiment, the learning speed of DHP is shown to be around 1700 times faster than TD(0). DHP solves the problem without any exploration, whereas TD(0) cannot solve it without explicit exploration. DHP requires knowledge of, and differentiability of, the environment´s model functions. This paper aims to illustrate the advantages of DHP when these two requirements are satisfied.
Keywords :
dynamic programming; learning (artificial intelligence); DHP; TD(0); adaptive dynamic programming; automatic local exploration; continuous state spaces; dual heuristic dynamic programming learning methods; learning speed; reinforcement learning; Algorithm design and analysis; Approximation algorithms; Equations; Heuristic algorithms; Stochastic processes; Trajectory; Vectors; Adaptive Dynamic Programming; DHP; Dual Heuristic Dynamic Programming; Reinforcement Learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks (IJCNN), The 2012 International Joint Conference on
Conference_Location :
Brisbane, QLD
ISSN :
2161-4393
Print_ISBN :
978-1-4673-1488-6
Electronic_ISBN :
2161-4393
Type :
conf
DOI :
10.1109/IJCNN.2012.6252569
Filename :
6252569
Link To Document :
بازگشت