• DocumentCode
    8418
  • Title

    Off-policy reinforcement learning with Gaussian processes

  • Author

    Chowdhary, Girish ; Miao Liu ; Grande, Robert ; Walsh, Thomas ; How, Jonathan ; Carin, Lawrence

  • Author_Institution
    Oklahomas State Univ., Stillwater, OK, USA
  • Volume
    1
  • Issue
    3
  • fYear
    2014
  • fDate
    Jul-14
  • Firstpage
    227
  • Lastpage
    238
  • Abstract
    An off-policy Bayesian nonparameteric approximate reinforcement learning framework, termed as GPQ, that employs a Gaussian processes (GP) model of the value (Q) function is presented in both the batch and online settings. Sufficient conditions on GP hyperparameter selection are established to guarantee convergence of off-policy GPQ in the batch setting, and theoretical and practical extensions are provided for the online case. Empirical results demonstrate GPQ has competitive learning speed in addition to its convergence guarantees and its ability to automatically choose its own bases locations.
  • Keywords
    Bayes methods; Gaussian processes; learning (artificial intelligence); GP hyperparameter selection; GPQ; Gaussian processes; batch setting; off-policy Bayesian nonparameteric approximate reinforcement learning framework; online setting; Approximation algorithms; Convergence; Function approximation; Gaussian processes; Learning (artificial intelligence); Bayesian nonparametric; Gaussian processes; Reinforcement learning; off-policy learning;
  • fLanguage
    English
  • Journal_Title
    Automatica Sinica, IEEE/CAA Journal of
  • Publisher
    ieee
  • ISSN
    2329-9266
  • Type

    jour

  • DOI
    10.1109/JAS.2014.7004680
  • Filename
    7004680