DocumentCode
8418
Title
Off-policy reinforcement learning with Gaussian processes
Author
Chowdhary, Girish ; Miao Liu ; Grande, Robert ; Walsh, Thomas ; How, Jonathan ; Carin, Lawrence
Author_Institution
Oklahomas State Univ., Stillwater, OK, USA
Volume
1
Issue
3
fYear
2014
fDate
Jul-14
Firstpage
227
Lastpage
238
Abstract
An off-policy Bayesian nonparameteric approximate reinforcement learning framework, termed as GPQ, that employs a Gaussian processes (GP) model of the value (Q) function is presented in both the batch and online settings. Sufficient conditions on GP hyperparameter selection are established to guarantee convergence of off-policy GPQ in the batch setting, and theoretical and practical extensions are provided for the online case. Empirical results demonstrate GPQ has competitive learning speed in addition to its convergence guarantees and its ability to automatically choose its own bases locations.
Keywords
Bayes methods; Gaussian processes; learning (artificial intelligence); GP hyperparameter selection; GPQ; Gaussian processes; batch setting; off-policy Bayesian nonparameteric approximate reinforcement learning framework; online setting; Approximation algorithms; Convergence; Function approximation; Gaussian processes; Learning (artificial intelligence); Bayesian nonparametric; Gaussian processes; Reinforcement learning; off-policy learning;
fLanguage
English
Journal_Title
Automatica Sinica, IEEE/CAA Journal of
Publisher
ieee
ISSN
2329-9266
Type
jour
DOI
10.1109/JAS.2014.7004680
Filename
7004680
Link To Document