Title :
On-policy Q-learning for adaptive optimal control
Author :
Jha, Sumit Kumar ; Bhasin, Shubhendu
Author_Institution :
Dept. of Electr. Eng., Indian Inst. of Technol. Delhi, New Delhi, India
Abstract :
This paper presents a novel on-policy Q-learning approach for finding the optimal control policy online for continuous-time linear time invariant (LTI) systems with completely unknown dynamics. The proposed result estimates the unknown parameters of the optimal control policy based on the fixed point equation involving the Q-function. The gradient-based update laws, based on the minimization of the Bellman´s error, are used to achieve online adaptation of parameters with the use of persistence of excitation condition. A novel asymptotically convergent state derivative estimator is presented to ensure that the proposed result is independent of knowledge of system dynamics. Simulation results are presented to validate the theoretical development.
Keywords :
adaptive control; continuous time systems; gradient methods; learning systems; linear systems; optimal control; state estimation; Bellman error; Q-function; adaptive optimal control; asymptotically convergent state derivative estimator; continuous-time linear time invariant systems; excitation condition; fixed point equation; gradient-based update laws; on-policy Q-learning approach; optimal control policy; unknown dynamics; Adaptation models; Adaptive systems; Convergence; Equations; Estimation error; Mathematical model; Optimal control; Q-learning adaptive optimal control; on-policy method;
Conference_Titel :
Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), 2014 IEEE Symposium on
Conference_Location :
Orlando, FL
DOI :
10.1109/ADPRL.2014.7010649