Continuous Action-Space Reinforcement Learning Methods Applied to the Minimum-Time Swing-Up of the Acrobot

Author

Barry D. Nichols

Author_Institution

Sch. of Sci. &

fYear

2015

Firstpage

2084

Lastpage

2089

Abstract

Here I apply three reinforcement learning methods to the full, continuous action, swing-up acrobot control benchmark problem. These include two approaches from the literature: CACLA and NM-SARSA and a novel approach which I refer to as Nelder Mead-SARSA. Nelder Mead-SARSA, like NMSARSA, directly optimises the state-action value function for action selection, in order to allow continuous action reinforcement learning without a separate policy function. However, as it uses a derivative-free method it does not require the first or second partial derivatives of the value function. All three methods achieved good results in terms of swing-up times, comparable to previous approaches from the literature. Particularly Nelder Mead-SARSA, which performed the swing up in a shorter time than many approaches from the literature.

Keywords

"Learning (artificial intelligence)","Training","Mathematical model","Switches","Reactive power","Newton method","Optimization"

Publisher

ieee

Conference_Titel

Systems, Man, and Cybernetics (SMC), 2015 IEEE International Conference on

Type

conf

DOI

10.1109/SMC.2015.364

Filename

7379496

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=3728260