Title :
Monte Carlo preference elicitation for learning additive reward functions
Author :
Rosenthal, Stephanie ; Veloso, Manuela
Author_Institution :
Carnegie Mellon Univ., Pittsburgh, PA, USA
Abstract :
AI agents including robots often use reward functions to evaluate tradeoffs between different states and actions and to determine optimal policies. We are particularly interested in reward functions that can be decomposed into an additive sum of subrewards that are computed on independent subproblems or features of the state space. If these subrewards capture different reward metrics, such as user satisfaction and task completion time, it is unclear how to scale the subrewards in the reward function to produce an appropriate policy. In this work, we propose and evaluate a novel Monte Carlo method for learning the scaling factors of subrewards, in which the training elicits humans´ preferences between two state-action scenarios. Because the algorithm elicits preferences over explicit scenarios, it is less susceptible to human error than previous elicitation approaches. The preferences are used to generate a set of inequalities over the scaling factors that we solve efficiently using a linear program. We show that our algorithm asks for a number of preferences proportional to log of the number of scaling factor hypotheses used in the Monte Carlo method.
Keywords :
Monte Carlo methods; learning (artificial intelligence); linear programming; multi-agent systems; AI agents; Monte Carlo preference elicitation; additive reward function learning; human preference elicitation; linear program; optimal policy; reward metrics; state space; state-action scenarios; subreward additive sum; subreward scaling factor learning; task completion time; user satisfaction; Additives; Approximation algorithms; Concrete; Humans; Monte Carlo methods; Probabilistic logic; Robots;
Conference_Titel :
RO-MAN, 2012 IEEE
Conference_Location :
Paris
Print_ISBN :
978-1-4673-4604-7
Electronic_ISBN :
1944-9445
DOI :
10.1109/ROMAN.2012.6343863