A reinforcement learning approach to on-line optimal control

Author

An, P.E. ; Aslam-Mir, S. ; Brown, M. ; Harris, C.J.

Author_Institution

Dept. of Aeronaut. & Astronaut., Southampton Univ., UK

Volume

4

fYear

1994

fDate

27 Jun-2 Jul 1994

Firstpage

2465

Abstract

Presents a hybrid control architecture for solving on-line optimal control. In this architecture, the control law is dynamically scheduled between a reinforcement controller and a stabilizing controller so that the closed-loop performance is smoothly transformed from a reactive behavior to one which can predict. Based on a modified Q-learning technique, the reinforcement controller is made of two components: policy and Q functions. The policy function is explicitly incorporated so as to bypass the minimum operator normally required for selecting actions and updating the Q function. This architecture is then applied to a repetitive operation using a second-order linear-time-variant plant with a nonlinear control structure. In this operation, the reinforcement signals are based on set-point errors and the reinforcement controller is generalized using second-order B-splines networks. This example illustrates how, for a, non-optimally tuned stabilizing controller, the closed-loop performance can be bootstrapped with the use of reinforcement learning. Results shows that the set-point performance of the hybrid controller is improved over that of the fixed structure controller by discovering better control strategies which compensate for the non-optimal gains and nonlinear control structure

Keywords

learning (artificial intelligence); linear systems; nonlinear control systems; optimal control; splines (mathematics); time-varying systems; closed-loop performance; hybrid control architecture; modified Q-learning technique; nonlinear control structure; nonoptimally tuned stabilizing controller; online optimal control; reactive behavior; reinforcement controller; reinforcement learning; repetitive operation; second-order B-splines networks; second-order linear-time-variant plant; set-point errors; stabilizing controller; Control systems; Costs; Dynamic scheduling; Error correction; Kinematics; Nonlinear control systems; Optimal control; Sampling methods; Spline; Supervised learning;

fLanguage

English

Publisher

ieee

Conference_Titel

Neural Networks, 1994. IEEE World Congress on Computational Intelligence., 1994 IEEE International Conference on

Conference_Location

Orlando, FL

Print_ISBN

0-7803-1901-X

Type

conf

DOI

10.1109/ICNN.1994.374607

Filename

374607