Title :
Intrinsically motivated model learning for a developing curious agent
Author :
Hester, T. ; Stone, Peter
Author_Institution :
Dept. of Comput. Sci., Univ. of Texas at Austin, Austin, TX, USA
Abstract :
Reinforcement Learning (RL) agents are typically deployed to learn a specific, concrete task based on a pre-defined reward function. However, in some cases an agent may be able to gain experience in the domain prior to being given a task. In such cases, intrinsic motivation can be used to enable the agent to learn a useful model of the environment that is likely to help it learn its eventual tasks more efficiently. This paper presents the TEXPLORE with Variance-And-Novelty-Intrinsic-Rewards algorithm (TEXPLORE-VANIR), an intrinsically motivated model-based RL algorithm. The algorithm learns models of the transition dynamics of a domain using random forests. It calculates two different intrinsic motivations from this model: one to explore where the model is uncertain, and one to acquire novel experiences that the model has not yet been trained on. This paper presents experiments demonstrating that the combination of these two intrinsic rewards enables the algorithm to learn an accurate model of a domain with no external rewards and that the learned model can be used afterward to perform tasks in the domain. While learning the model, the agent explores the domain in a developing and curious way, progressively learning more complex skills. In addition, the experiments show that combining the agent´s intrinsic rewards with external task rewards enables the agent to learn faster than using external rewards alone.
Keywords :
Markov processes; learning (artificial intelligence); multi-agent systems; pattern classification; Markov decision process formalism; TEXPLORE; TEXPLORE-VANIR; concrete task learning; curious agent development; external task rewards; intrinsic rewards; intrinsically motivated model learning; intrinsically motivated model-based RL algorithm; predefined reward function; random forests; reinforcement learning agents; transition dynamics; variance-and-novelty-intrinsic-rewards algorithm; Accuracy; Heuristic algorithms; Mathematical model; Prediction algorithms; Predictive models; Presses; Vegetation;
Conference_Titel :
Development and Learning and Epigenetic Robotics (ICDL), 2012 IEEE International Conference on
Conference_Location :
San Diego, CA
Print_ISBN :
978-1-4673-4964-2
Electronic_ISBN :
978-1-4673-4963-5
DOI :
10.1109/DevLrn.2012.6400802