DocumentCode :
3709126
Title :
Nonparametric Bayesian reward segmentation for skill discovery using inverse reinforcement learning
Author :
Pravesh Ranchod;Benjamin Rosman;George Konidaris
Author_Institution :
School of Computer Science and Applied Mathematics, University of the Witwatersrand, Johannesburg, South Africa
fYear :
2015
Firstpage :
471
Lastpage :
477
Abstract :
We present a method for segmenting a set of unstructured demonstration trajectories to discover reusable skills using inverse reinforcement learning (IRL). Each skill is characterised by a latent reward function which the demonstrator is assumed to be optimizing. The skill boundaries and the number of skills making up each demonstration are unknown. We use a Bayesian nonparametric approach to propose skill segmentations and maximum entropy inverse reinforcement learning to infer reward functions from the segments. This method produces a set of Markov Decision Processes (MDPs) that best describe the input trajectories. We evaluate this approach in a car driving domain and a simulated quadcopter obstacle course, showing that it is able to recover demonstrated skills more effectively than existing methods.
Keywords :
"Trajectory","Hidden Markov models","Learning (artificial intelligence)","Bayes methods","Markov processes","Heuristic algorithms","Context"
Publisher :
ieee
Conference_Titel :
Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ International Conference on
Type :
conf
DOI :
10.1109/IROS.2015.7353414
Filename :
7353414
Link To Document :
بازگشت