• DocumentCode
    3728260
  • Title

    Continuous Action-Space Reinforcement Learning Methods Applied to the Minimum-Time Swing-Up of the Acrobot

  • Author

    Barry D. Nichols

  • Author_Institution
    Sch. of Sci. &
  • fYear
    2015
  • Firstpage
    2084
  • Lastpage
    2089
  • Abstract
    Here I apply three reinforcement learning methods to the full, continuous action, swing-up acrobot control benchmark problem. These include two approaches from the literature: CACLA and NM-SARSA and a novel approach which I refer to as Nelder Mead-SARSA. Nelder Mead-SARSA, like NMSARSA, directly optimises the state-action value function for action selection, in order to allow continuous action reinforcement learning without a separate policy function. However, as it uses a derivative-free method it does not require the first or second partial derivatives of the value function. All three methods achieved good results in terms of swing-up times, comparable to previous approaches from the literature. Particularly Nelder Mead-SARSA, which performed the swing up in a shorter time than many approaches from the literature.
  • Keywords
    "Learning (artificial intelligence)","Training","Mathematical model","Switches","Reactive power","Newton method","Optimization"
  • Publisher
    ieee
  • Conference_Titel
    Systems, Man, and Cybernetics (SMC), 2015 IEEE International Conference on
  • Type

    conf

  • DOI
    10.1109/SMC.2015.364
  • Filename
    7379496