مرکز منطقه ای اطلاع رساني علوم و فناوري - Reinforcement learning combined with human feedback in continuous state and action spaces

DocumentCode :

586568

Title :

Reinforcement learning combined with human feedback in continuous state and action spaces

Author :

Ngo Anh Vien ; Ertel, Wolfgang

Author_Institution :

Inst. of Artificial Intell., Ravensburg-Weingarten Univ. of Appl. Sci., Weingarten, Germany

fYear :

2012

fDate :

7-9 Nov. 2012

Firstpage :

Lastpage :

Abstract :

We consider the problem of extending manually trained agents via evaluative reinforcement (TAMER) in continuous state and action spaces. The early work TAMER framework allows a non-technical human train an agent through a natural form of human feedback, negative or positive. The advantages of TAMER have been shown on applications such as training Tetris and Mountain Car with only human feedback, Cart-pole and Mountain Car with human feedback and environment reward (augmenting reinforcement learning with human feedback). However, those methods are originally designed for discrete state-action, or continuous state-discrete action problems. We propose an extension of TAMER to allow both continuous states and actions, called ACTAMER. The new framework extends the original TAMER to allow using any general function approximation of a human trainer´s reinforcement signal. Moreover, we investigate a combination capability of the ACTAMER and reinforcement learning (RL). The combination of human feedback and RL is studied in both settings: sequential and simultaneous. Our experimental results show the proposed method successfully allowing a human to train an agent in two continuous state-action domains: Mountain Car, Cart-pole (balancing).

Keywords :

learning (artificial intelligence); multi-agent systems; ACTAMER; Cart-pole; Mountain Car training; RL; TAMER framework; Tetris training; action spaces; augmenting reinforcement learning; continuous state-action domains; continuous state-discrete action problems; continuous states; discrete state-action; environment reward; evaluative reinforcement; human feedback; manually trained agents; reinforcement signal; Approximation algorithms; Function approximation; Humans; Learning; Training; Vectors;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Development and Learning and Epigenetic Robotics (ICDL), 2012 IEEE International Conference on

Conference_Location :

San Diego, CA

Print_ISBN :

978-1-4673-4964-2

Electronic_ISBN :

978-1-4673-4963-5

Type :

conf

DOI :

10.1109/DevLrn.2012.6400849

Filename :

6400849

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=586568