Title :
Mixed Reinforcement Learning for Partially Observable Markov Decision Process
Author :
Dung, Le Tien ; Komeda, Takashi ; Takagi, Motoki
Author_Institution :
Shibaura Inst. of Technol., Tokyo
Abstract :
Reinforcement learning has been widely used to solve problems with a little feedback from environment. Q learning can solve full observable Markov decision processes quite well. For partially observable Markov decision processes (POMDPs), a recurrent neural network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. In this paper, Mixed Reinforcement Learning is presented to find an optimal policy for POMDPs in a shorter learning time. This method uses both a Q value table and a RNN. Q value table stores Q values for full observable states and the RNN approximates Q values for hidden states. An observable degree is calculated for each state while the agent explores the environment. If the observable degree is less than a threshold, the state is considered as a hidden state. Results of experiment in lighting grid world problem show that the proposed method enables an agent to acquire a policy, as good as the policy acquired by using only a RNN, with better learning performance.
Keywords :
Markov processes; learning (artificial intelligence); recurrent neural nets; Q learning; mixed reinforcement learning; partially observable markov decision process; recurrent neural network; Computational intelligence; History; Learning; Neural networks; Neurofeedback; Recurrent neural networks; Robotics and automation; State-space methods; Table lookup; USA Councils;
Conference_Titel :
Computational Intelligence in Robotics and Automation, 2007. CIRA 2007. International Symposium on
Conference_Location :
Jacksonville, FI
Print_ISBN :
1-4244-0790-7
Electronic_ISBN :
1-4244-0790-7
DOI :
10.1109/CIRA.2007.382910