Adaptive linear quadratic control using policy iteration

Author

Bradtke, Steven J. ; Ydstie, B. Erik ; Barto, Andrew G.

Author_Institution

Dept. of Comput. & Inf. Sci., Massachusetts Univ., Amherst, MA, USA

Volume

3

fYear

1994

fDate

29 June-1 July 1994

Firstpage

3475

Abstract

In this paper we present the stability and convergence results for dynamic programming-based reinforcement learning applied to linear quadratic regulation (LQR). The specific algorithm we analyze is based on Q-learning and it is proven to converge to an optimal controller provided that the underlying system is controllable and a particular signal vector is persistently excited. This is the first convergence result for DP-based reinforcement learning algorithms for a continuous problem.

Keywords

adaptive control; discrete time systems; dynamic programming; intelligent control; iterative methods; learning (artificial intelligence); linear quadratic control; multivariable systems; stability; Q-learning; adaptive linear quadratic control; convergence; discrete time systems; dynamic programming-based reinforcement learning; multivariable system; optimal controller; policy iteration; signal vector; stability; Adaptive control; Computer science; Control systems; Cost function; Feedback control; Learning; Optimal control; Programmable control; Symmetric matrices; Vectors;

fLanguage

English

Publisher

ieee

Conference_Titel

American Control Conference, 1994

Print_ISBN

0-7803-1783-1

Type

conf

DOI

10.1109/ACC.1994.735224

Filename

735224