• DocumentCode
    2583993
  • Title

    Integral Reinforcement Learning for online computation of feedback Nash strategies of nonzero-sum differential games

  • Author

    Vrabie, Draguna ; Lewis, Frank

  • Author_Institution
    Autom. & Robot. Res. Inst., Univ. of Texas at Arlington, Fort Worth, TX, USA
  • fYear
    2010
  • fDate
    15-17 Dec. 2010
  • Firstpage
    3066
  • Lastpage
    3071
  • Abstract
    This paper presents an Approximate/Adaptive Dynamic Programming (ADP) algorithm that finds online the Nash equilibrium for two-player nonzero-sum differential games with linear dynamics and infinite horizon quadratic cost. Each of the game players is using the procedure of Integral Reinforcement Learning (IRL) to calculate online the infinite horizon value function that it associates with every given set of feedback control policies. It will be shown that the online algorithm is mathematically equivalent to an offline iterative method, previously introduced in the literature, that solves the set of coupled algebraic Riccati equations (ARE) underlying the game problem using complete knowledge on the system dynamics. Here we show how the ADP techniques will enhance the capabilities of the offline method allowing an online solution without the requirement of complete knowledge of the system dynamics. The two participants in the continuous-time differential game are competing in real-time and the feedback Nash control strategies will be determined based on online measured data from the system. The algorithm is built on interplay between a learning phase, where each of the players is learning online the value that they associate with a given set of play policies, and a policy update step, performed by each of the payers towards decreasing the value of their cost. The players are learning concurrently. The feasibility of the ADP scheme is demonstrated in simulation.
  • Keywords
    Riccati equations; differential games; dynamic programming; iterative methods; learning (artificial intelligence); approximate-adaptive dynamic programming algorithm; coupled algebraic Riccati equations; feedback Nash strategies; infinite horizon quadratic cost; infinite horizon value function; integral reinforcement learning; linear dynamics; offline iterative method; online computation; two-player nonzero-sum differential games; Cost function; Games; Heuristic algorithms; Infinite horizon; Learning; Nash equilibrium;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Decision and Control (CDC), 2010 49th IEEE Conference on
  • Conference_Location
    Atlanta, GA
  • ISSN
    0743-1546
  • Print_ISBN
    978-1-4244-7745-6
  • Type

    conf

  • DOI
    10.1109/CDC.2010.5718152
  • Filename
    5718152