DocumentCode
3728260
Title
Continuous Action-Space Reinforcement Learning Methods Applied to the Minimum-Time Swing-Up of the Acrobot
Author
Barry D. Nichols
Author_Institution
Sch. of Sci. &
fYear
2015
Firstpage
2084
Lastpage
2089
Abstract
Here I apply three reinforcement learning methods to the full, continuous action, swing-up acrobot control benchmark problem. These include two approaches from the literature: CACLA and NM-SARSA and a novel approach which I refer to as Nelder Mead-SARSA. Nelder Mead-SARSA, like NMSARSA, directly optimises the state-action value function for action selection, in order to allow continuous action reinforcement learning without a separate policy function. However, as it uses a derivative-free method it does not require the first or second partial derivatives of the value function. All three methods achieved good results in terms of swing-up times, comparable to previous approaches from the literature. Particularly Nelder Mead-SARSA, which performed the swing up in a shorter time than many approaches from the literature.
Keywords
"Learning (artificial intelligence)","Training","Mathematical model","Switches","Reactive power","Newton method","Optimization"
Publisher
ieee
Conference_Titel
Systems, Man, and Cybernetics (SMC), 2015 IEEE International Conference on
Type
conf
DOI
10.1109/SMC.2015.364
Filename
7379496
Link To Document