Title :
Labeling Q-learning for non-Markovian environments
Author :
Lee, Hae Yeon ; Kamaya, Hiroyuki ; Abe, Kenichi
Author_Institution :
Dept. of Electr. & Commun. Eng., Tohoku Univ., Sendai, Japan
Abstract :
The most widely used reinforcement learning (RL) algorithms, such as Q-learning and TD (λ) are limited to Markovian environments. Recent research on reinforcement learning algorithms has concentrated on partially observable Markov decision process (POMDP). The only way to overcome partial observability is to use memory to estimate state. In this paper, we present a new memory architecture of RL algorithms to solve certain type of POMDPs. Our algorithm, which we call labeling Q-learning (LQ-learning), is applied to test problems of simple mazes taken from recent literature. The results demonstrate LQ-learning´s ability to work well in near optimal manner
Keywords :
learning (artificial intelligence); memory architecture; observability; state estimation; Markov decision process; labeling Q-learning; memory architecture; partial observability; reinforcement learning; state estimation; Educational institutions; Feedback; Labeling; Learning; Memory architecture; Observability; Signal processing; State estimation; Testing; Working environment noise;
Conference_Titel :
Systems, Man, and Cybernetics, 1999. IEEE SMC '99 Conference Proceedings. 1999 IEEE International Conference on
Conference_Location :
Tokyo
Print_ISBN :
0-7803-5731-0
DOI :
10.1109/ICSMC.1999.815599