• DocumentCode
    423668
  • Title

    Model-free off-policy reinforcement learning in continuous environment

  • Author

    Wawrzynski, Pawel ; Pacut, Andrzej

  • Author_Institution
    Inst. of Control & Comput. Eng., Warsaw Univ. of Technol., Poland
  • Volume
    2
  • fYear
    2004
  • fDate
    25-29 July 2004
  • Firstpage
    1091
  • Abstract
    We introduce an algorithm of reinforcement learning in continuous state and action spaces. In order to construct a control policy, the algorithm utilizes the entire history of agent-environment interaction. The policy is a result of an estimation process based on all available information rather than the result of stochastic convergence as in classical reinforcement learning approaches. The policy is derived from the history directly, not through any kind of a model of the environment. We test our algorithm in the cart-pole swing-up simulated environment. The algorithm learns to control this plant in about 100 trials, which corresponds to 15 minutes of plant´s real time. This is several times shorter than the one required by other algorithms.
  • Keywords
    convergence of numerical methods; estimation theory; learning (artificial intelligence); stochastic processes; agent environment interaction; cart pole swing up simulated environment; control policy; estimation process; model free off policy reinforcement learning; stochastic convergence; Artificial intelligence; Control engineering computing; Convergence; Dynamic programming; History; Learning; Monte Carlo methods; Space technology; Stochastic processes; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks, 2004. Proceedings. 2004 IEEE International Joint Conference on
  • ISSN
    1098-7576
  • Print_ISBN
    0-7803-8359-1
  • Type

    conf

  • DOI
    10.1109/IJCNN.2004.1380086
  • Filename
    1380086