• DocumentCode
    42342
  • Title

    Integral Reinforcement Learning for Continuous-Time Input-Affine Nonlinear Systems With Simultaneous Invariant Explorations

  • Author

    Jae Young Lee ; Jin Bae Park ; Yoon Ho Choi

  • Author_Institution
    Dept. of Electr. & Electron. Eng., Yonsei Univ., Seoul, South Korea
  • Volume
    26
  • Issue
    5
  • fYear
    2015
  • fDate
    May-15
  • Firstpage
    916
  • Lastpage
    932
  • Abstract
    This paper focuses on a class of reinforcement learning (RL) algorithms, named integral RL (I-RL), that solve continuous-time (CT) nonlinear optimal control problems with input-affine system dynamics. First, we extend the concepts of exploration, integral temporal difference, and invariant admissibility to the target CT nonlinear system that is governed by a control policy plus a probing signal called an exploration. Then, we show input-to-state stability (ISS) and invariant admissibility of the closed-loop systems with the policies generated by integral policy iteration (I-PI) or invariantly admissible PI (IA-PI) method. Based on these, three online I-RL algorithms named explorized I-PI and integral $Q$ -learning I, II are proposed, all of which generate the same convergent sequences as I-PI and IA-PI under the required excitation condition on the exploration. All the proposed methods are partially or completely model free, and can simultaneously explore the state space in a stable manner during the online learning processes. ISS, invariant admissibility, and convergence properties of the proposed methods are also investigated, and related with these, we show the design principles of the exploration for safe learning. Neural-network-based implementation methods for the proposed schemes are also presented in this paper. Finally, several numerical simulations are carried out to verify the effectiveness of the proposed methods.
  • Keywords
    closed loop systems; continuous time systems; convergence of numerical methods; iterative methods; learning (artificial intelligence); neurocontrollers; nonlinear control systems; optimal control; stability; CT nonlinear system; IA-PI method; closed-loop systems; completely model free; continuous-time input-affine nonlinear systems; continuous-time nonlinear optimal control problems; convergent sequence generation; explorized I-PI algorithm; input-affine system dynamics; input-to-state stability; integral Q-learning algorithm; integral policy iteration; integral reinforcement learning algorithm; integral temporal difference; invariantly admissible PI method; neural-network-based implementation methods; numerical simulations; partially model free; probing signal; simultaneous invariant explorations; Convergence; Equations; Heuristic algorithms; Nonlinear systems; Optimal control; Stability analysis; Adaptive optimal control; Q-learning; continuous-time (CT); exploration; policy iteration (PI); reinforcement learning (RL);
  • fLanguage
    English
  • Journal_Title
    Neural Networks and Learning Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    2162-237X
  • Type

    jour

  • DOI
    10.1109/TNNLS.2014.2328590
  • Filename
    6882245