Title :
Information theoretic reward shaping for curiosity driven learning in POMDPs
Author :
Mafi, Nassim ; Abtahi, Farnaz ; Fasel, Ian
Author_Institution :
Technol. & Arts Dept. of Comput. Sci., Univ. of Arizona, Tucson, AZ, USA
Abstract :
In the real world, intelligent agents must use multi-modal, limited range and limited accuracy sensors to gain knowledge of the environment in service of their goals. The problem of choosing both sensing and task-specific actions can be viewed as a partially observable Markov decision process (POMDP), for which reinforcement learning (RL) can be used to learn policies from experience. In this paper we propose a mechanism for speeding up RL in POMDPs by using an information-based shaping reward, which can be automatically derived from the belief distribution. This reward acts as a domain-general intrinsic curiosity that allows the agent to improve its behavior even when it is not skilled enough to achieve task-specific goals. Previous work has shown that this intrinsic reward can lead to intelligent behaviors in absence of a task. In this paper, we combine the curiosity reward with a task-specific reward in the parameter exploring policy gradient (PGPE) algorithm in a “Market”, and show through several experiments that the curiosity reward significantly speeds up learning and improves the quality of policies compared to those that use only the extrinsic, task-specific reward signal.
Keywords :
Markov processes; learning (artificial intelligence); software agents; curiosity driven learning; domain-general intrinsic curiosity; information theory; information-based shaping reward; intelligent agent; intelligent behavior; parameter exploring policy gradient algorithm; partially observable Markov decision process; reinforcement learning; USA Councils;
Conference_Titel :
Development and Learning (ICDL), 2011 IEEE International Conference on
Conference_Location :
Frankfurt am Main
Print_ISBN :
978-1-61284-989-8
DOI :
10.1109/DEVLRN.2011.6037344