Title of article :
On integral generalized policy iteration for continuous-time linear quadratic regulations
Author/Authors :
Lee، نويسنده , , Jae Young and Park، نويسنده , , Jin Bae and Choi، نويسنده , , Yoon Ho، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2014
Pages :
15
From page :
475
To page :
489
Abstract :
This paper mathematically analyzes the integral generalized policy iteration (I-GPI) algorithms applied to a class of continuous-time linear quadratic regulation (LQR) problems with the unknown system matrix A . GPI is the general idea of interacting policy evaluation and policy improvement steps of policy iteration (PI), for computing the optimal policy. We first introduce the update horizon ℏ , and then show that (i) all of the I-GPI methods with the same ℏ can be considered equivalent and that (ii) the value function approximated in the policy evaluation step monotonically converges to the exact one as ℏ → ∞ . This reveals the relation between the computational complexity and the update (or time) horizon of I-GPI as well as between I-PI and I-GPI in the limit ℏ → ∞ . We also provide and discuss two modes of convergence of I-GPI; I-GPI behaves like PI in one mode, and in the other mode, it performs like value iteration for discrete-time LQR and infinitesimal GPI ( ℏ → 0 ). From these results, a new classification of the integral reinforcement learning is formed with respect to ℏ . Two matrix inequality conditions for stability, the region of local monotone convergence, and data-driven (adaptive) implementation methods are also provided with detailed discussion. Numerical simulations are carried out for verification and further investigations.
Keywords :
LQR , Generalized policy iteration , reinforcement learning , Adaptive control , Optimization under uncertainties
Journal title :
Automatica
Serial Year :
2014
Journal title :
Automatica
Record number :
1449655
Link To Document :
بازگشت