DocumentCode
423668
Title
Model-free off-policy reinforcement learning in continuous environment
Author
Wawrzynski, Pawel ; Pacut, Andrzej
Author_Institution
Inst. of Control & Comput. Eng., Warsaw Univ. of Technol., Poland
Volume
2
fYear
2004
fDate
25-29 July 2004
Firstpage
1091
Abstract
We introduce an algorithm of reinforcement learning in continuous state and action spaces. In order to construct a control policy, the algorithm utilizes the entire history of agent-environment interaction. The policy is a result of an estimation process based on all available information rather than the result of stochastic convergence as in classical reinforcement learning approaches. The policy is derived from the history directly, not through any kind of a model of the environment. We test our algorithm in the cart-pole swing-up simulated environment. The algorithm learns to control this plant in about 100 trials, which corresponds to 15 minutes of plant´s real time. This is several times shorter than the one required by other algorithms.
Keywords
convergence of numerical methods; estimation theory; learning (artificial intelligence); stochastic processes; agent environment interaction; cart pole swing up simulated environment; control policy; estimation process; model free off policy reinforcement learning; stochastic convergence; Artificial intelligence; Control engineering computing; Convergence; Dynamic programming; History; Learning; Monte Carlo methods; Space technology; Stochastic processes; Testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Neural Networks, 2004. Proceedings. 2004 IEEE International Joint Conference on
ISSN
1098-7576
Print_ISBN
0-7803-8359-1
Type
conf
DOI
10.1109/IJCNN.2004.1380086
Filename
1380086
Link To Document