DocumentCode :
2340430
Title :
Eliciting preferences over observed behaviours based on relative evaluations
Author :
Da Silva, Valdinei Freire ; Lima, Pedro ; Costa, Anna Helena Reali
Author_Institution :
Univ. of Sao Paulo, Sao Paulo
fYear :
2007
fDate :
Oct. 29 2007-Nov. 2 2007
Firstpage :
423
Lastpage :
428
Abstract :
Reinforcement learning addresses the question of programming an autonomous agent to execute tasks that are described as reinforcement functions. Then, the agent is responsible for discovering the best actions to fulfil such task. Most of the work on reinforcement learning considers that reinforcements are given by the environment, not addressing the problem of how to describe tasks as reinforcement functions. Preference elicitation addresses the problem of describing a human preference through utility functions, from which reinforcement functions are special cases. This paper proposes an approach where preference elicitation and reinforcement learning are handled in an integrated manner, providing an autonomous method of programming an agent. The agent is programmed through pairwise evaluations over observed behaviours of the agent, where the evaluations are summarised in the reinforcement function. In this paper we present an approach to solve such a problem based on evaluations over observed behaviours. We propose a new algorithm, PEOB-RS, that can be shown to converge towards an optimal policy, providing the number of trials for each behaviour tends to infinity. Experimental results from learning in a grid stochastic environment are used to obtain a reinforcement function, illustrating the effectiveness of PEOB-RS, even if requiring too many evaluations. Such reinforcement function is then transferred to a more real-like environment simulating a pioneer robot, showing the abstraction property of utility functions.
Keywords :
learning (artificial intelligence); mobile robots; robot programming; autonomous agent programming; grid stochastic environment; pioneer robot; preference elicitation; reinforcement learning; utility function; Autonomous agents; Functional programming; Intelligent robots; Learning; Notice of Violation; Programming profession; Robot programming; Stochastic processes; USA Councils; Utility theory;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Robots and Systems, 2007. IROS 2007. IEEE/RSJ International Conference on
Conference_Location :
San Diego, CA
Print_ISBN :
978-1-4244-0912-9
Electronic_ISBN :
978-1-4244-0912-9
Type :
conf
DOI :
10.1109/IROS.2007.4399403
Filename :
4399403
Link To Document :
بازگشت