Title :
Region enhanced neural Q-learning for solving model-based POMDPs
Author :
Wiering, Marco A. ; Kooi, Thijs
Author_Institution :
Dept. of Artificial Intell., Univ. of Groningen, Groningen, Netherlands
Abstract :
To get a robot to perform tasks autonomously, the robot has to plan its behavior and make decisions based on the input it receives. Unfortunately, contemporary robot sensors and actuators are subject to noise, rendering optimal decision making a stochastic process. To model this process, partially observable Markov decision processes (POMDPs) can be applied. In this paper we introduce the RENQ algorithm, a new POMDP algorithm that combines neural networks for estimating Q-values with the construction of a spatial pyramid over the state space. RENQ essentially uses region-based belief vectors together with state-based belief vectors, and these are used as inputs to the neural network trained with Q-learning. We compare RENQ to Qmdp and Perseus, two state-of-the-art algorithms for approximately solving model-based POMDPs. The results on three different maze navigation tasks indicate that RENQ outperforms Perseus on all problems and Qmdp if the problem becomes larger.
Keywords :
Markov processes; decision making; learning (artificial intelligence); neural nets; path planning; robots; actuators; autonomous robot; maze navigation task; model-based POMDP; neural Q-learning; partially observable Markov decision processes; rendering optimal decision making; robot sensors; stochastic process; Navigation; Robot sensing systems; Variable speed drives;
Conference_Titel :
Neural Networks (IJCNN), The 2010 International Joint Conference on
Conference_Location :
Barcelona
Print_ISBN :
978-1-4244-6916-1
DOI :
10.1109/IJCNN.2010.5596811