مرکز منطقه ای اطلاع رساني علوم و فناوري - Off-policy reinforcement learning with Gaussian processes

DocumentCode :

8418

Title :

Off-policy reinforcement learning with Gaussian processes

Author :

Chowdhary, Girish ; Miao Liu ; Grande, Robert ; Walsh, Thomas ; How, Jonathan ; Carin, Lawrence

Author_Institution :

Oklahomas State Univ., Stillwater, OK, USA

Volume :

Issue :

fYear :

2014

fDate :

Jul-14

Firstpage :

227

Lastpage :

238

Abstract :

An off-policy Bayesian nonparameteric approximate reinforcement learning framework, termed as GPQ, that employs a Gaussian processes (GP) model of the value (Q) function is presented in both the batch and online settings. Sufficient conditions on GP hyperparameter selection are established to guarantee convergence of off-policy GPQ in the batch setting, and theoretical and practical extensions are provided for the online case. Empirical results demonstrate GPQ has competitive learning speed in addition to its convergence guarantees and its ability to automatically choose its own bases locations.

Keywords :

Bayes methods; Gaussian processes; learning (artificial intelligence); GP hyperparameter selection; GPQ; Gaussian processes; batch setting; off-policy Bayesian nonparameteric approximate reinforcement learning framework; online setting; Approximation algorithms; Convergence; Function approximation; Gaussian processes; Learning (artificial intelligence); Bayesian nonparametric; Gaussian processes; Reinforcement learning; off-policy learning;

fLanguage :

English

Journal_Title :

Automatica Sinica, IEEE/CAA Journal of

Publisher :

ieee

ISSN :

2329-9266

Type :

jour

DOI :

10.1109/JAS.2014.7004680

Filename :

7004680

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=8418