Policy iteration-mode monotone convergence of generalized policy iteration for discrete-time linear systems

Author

Tae Yoon Chun ; Jin Bae Park ; Yoon Ho Choi

Author_Institution

Dept. of Electr. Eng., Yonsei Univ., Seoul, South Korea

fYear

2013

fDate

20-23 Oct. 2013

Firstpage

454

Lastpage

458

Abstract

This paper presents the properties of policy iteration (PI)-mode monotone convergence and stability of generalized policy iteration (OPI) algorithms for discrete-time (DT) linear systems. OPI is one of the reinforcement learning based dynamic programming (DP) methods for solving optimal control problems, interacting policy evaluation and policy improvement steps. To deal with the convergence and stability of GPI, several equivalent equations are derived. Then, as a result, the PI-mode monotone convergence (one behaves like PI) and stability of GPI algorithm are proved under the some initial conditions which are closely related with Lyapunov approach. Finally, some numerical simulations are performed to verify the proposed convergence and stability properties.

Keywords

Lyapunov methods; discrete time systems; dynamic programming; learning (artificial intelligence); optimal control; stability; GPI; Lyapunov approach; PI-mode monotone convergence; discrete-time linear system; dynamic programming; generalized policy iteration; optimal control problem; policy iteration-mode monotone convergence; reinforcement learning; stability property; Approximation algorithms; Education; Stability analysis; generalized policy iteration; linear quadratic regulator; policy iteration-mode monotone convergence;

fLanguage

English

Publisher

ieee

Conference_Titel

Control, Automation and Systems (ICCAS), 2013 13th International Conference on

Conference_Location

Gwangju

ISSN

2093-7121

Print_ISBN

978-89-93215-05-2

Type

conf

DOI

10.1109/ICCAS.2013.6703973

Filename

6703973