مرکز منطقه ای اطلاع رساني علوم و فناوري - Online least-squares policy iteration for reinforcement learning control

DocumentCode :

3283011

Title :

Online least-squares policy iteration for reinforcement learning control

Author :

Busoniu, L. ; Ernst, D. ; De Schutter, B. ; Babuska, R.

Author_Institution :

Delft Center for Syst. & Control, Delft Univ. of Technol., Delft, Netherlands

fYear :

2010

fDate :

June 30 2010-July 2 2010

Firstpage :

486

Lastpage :

491

Abstract :

Reinforcement learning is a promising paradigm for learning optimal control. We consider policy iteration (PI) algorithms for reinforcement learning, which iteratively evaluate and improve control policies. State-of-the-art, least-squares techniques for policy evaluation are sample-efficient and have relaxed convergence requirements. However, they are typically used in offline PI, whereas a central goal of reinforcement learning is to develop online algorithms. Therefore, we propose an online PI algorithm that evaluates policies with the so-called least-squares temporal difference for Q-functions (LSTD-Q). The crucial difference between this online least-squares policy iteration (LSPI) algorithm and its offline counterpart is that, in the online case, policy improvements must be performed once every few state transitions, using only an incomplete evaluation of the current policy. In an extensive experimental evaluation, online LSPI is found to work well for a wide range of its parameters, and to learn successfully in a real-time example. Online LSPI also compares favorably with offline LSPI and with a different flavor of online PI, which instead of LSTD-Q employs another least-squares method for policy evaluation.

Keywords :

PI control; adaptive control; iterative methods; learning (artificial intelligence); learning systems; least squares approximations; optimal control; Q-function; least square temporal difference; online least square policy iteration; optimal control; reinforcement learning control; state transition; Computational efficiency; Control systems; Convergence; Iterative algorithms; Learning; Optimal control; Optimization methods; Performance evaluation; Process control; Stochastic processes;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

American Control Conference (ACC), 2010

Conference_Location :

Baltimore, MD

ISSN :

0743-1619

Print_ISBN :

978-1-4244-7426-4

Type :

conf

DOI :

10.1109/ACC.2010.5530856

Filename :

5530856

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3283011