مرکز منطقه ای اطلاع رساني علوم و فناوري - Reinforcement learning from human reward: Discounting in episodic tasks

DocumentCode :

2020499

Title :

Reinforcement learning from human reward: Discounting in episodic tasks

Author :

Knox, W. Bradley ; Stone, Peter

Author_Institution :

Dept. of Comput. Sci., Univ. of Texas at Austin, Austin, TX, USA

fYear :

2012

fDate :

9-13 Sept. 2012

Firstpage :

878

Lastpage :

885

Abstract :

Several studies have demonstrated that teaching agents by human-generated reward can be a powerful technique. However, the algorithmic space for learning from human reward has hitherto not been explored systematically. Using model-based reinforcement learning from human reward in goal-based, episodic tasks, we investigate how anticipated future rewards should be discounted to create behavior that performs well on the task that the human trainer intends to teach. We identify a “positive circuits” problem with low discounting (i.e., high discount factors) that arises from an observed bias among humans towards giving positive reward. Empirical analyses indicate that high discounting (i.e., low discount factors) of human reward is necessary in goal-based, episodic tasks and lend credence to the existence of the positive circuits problem.

Keywords :

behavioural sciences; computer aided instruction; interactive systems; learning (artificial intelligence); software agents; teaching; agent teaching; algorithmic space; discount factors; empirical analysis; goal-based episodic tasks; human trainer; human-generated positive reward; model-based reinforcement learning; positive circuits problem; Algorithm design and analysis; Analytical models; Humans; Integrated circuit modeling; Learning; Training;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

RO-MAN, 2012 IEEE

Conference_Location :

Paris

ISSN :

1944-9445

Print_ISBN :

978-1-4673-4604-7

Electronic_ISBN :

1944-9445

Type :

conf

DOI :

10.1109/ROMAN.2012.6343862

Filename :

6343862

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2020499