Title : 
Labeling Q-learning for non-Markovian environments
         
        
            Author : 
Lee, Hae Yeon ; Kamaya, Hiroyuki ; Abe, Kenichi
         
        
            Author_Institution : 
Dept. of Electr. & Commun. Eng., Tohoku Univ., Sendai, Japan
         
        
        
        
        
        
            Abstract : 
The most widely used reinforcement learning (RL) algorithms, such as Q-learning and TD (λ) are limited to Markovian environments. Recent research on reinforcement learning algorithms has concentrated on partially observable Markov decision process (POMDP). The only way to overcome partial observability is to use memory to estimate state. In this paper, we present a new memory architecture of RL algorithms to solve certain type of POMDPs. Our algorithm, which we call labeling Q-learning (LQ-learning), is applied to test problems of simple mazes taken from recent literature. The results demonstrate LQ-learning´s ability to work well in near optimal manner
         
        
            Keywords : 
learning (artificial intelligence); memory architecture; observability; state estimation; Markov decision process; labeling Q-learning; memory architecture; partial observability; reinforcement learning; state estimation; Educational institutions; Feedback; Labeling; Learning; Memory architecture; Observability; Signal processing; State estimation; Testing; Working environment noise;
         
        
        
        
            Conference_Titel : 
Systems, Man, and Cybernetics, 1999. IEEE SMC '99 Conference Proceedings. 1999 IEEE International Conference on
         
        
            Conference_Location : 
Tokyo
         
        
        
            Print_ISBN : 
0-7803-5731-0
         
        
        
            DOI : 
10.1109/ICSMC.1999.815599