Labeling Q-learning embedded with knowledge update in partially observable mdp environments

Author

Lee, Haeyeon ; Kamaya, Hiroyuki ; Abe, Kenichi

Author_Institution

Electr. & Commun. Eng., Tohoku Univ., Aoba

fYear

2004

fDate

Aug. 30 2004-Sept. 1 2004

Firstpage

329

Lastpage

333

Abstract

In POMDP (partially observable Markov decision process) environments, a learning agent cannot observe the environment directly, thus partially observed states appeared. In order to overcome this partially observable problem, we had proposed a new RL (reinforcement learning) algorithm, called "labeling Q-learning". Unlike the original LQ-learning, for an advanced LQ-learning, a prior knowledge about environment is prepared ahead of learning process. The knowledge is a kind of self-organizing classification of sequences (i.e. pattern of state transition). It provides the classified sequence which consists with passed states, here it is called "group". A new LQ-learning agent assumes the transition of groups to be a landmark-like labeling situation. In this paper, we try to extend the advanced LQ-learning based on knowledge update in more extended environment. In order to demonstrate LQ-learning embedded with knowledge which is even though made from another environment, we can apply it to grid-world problems shown in many literatures (Wiering and Schmidhuber, 1997)

Keywords

learning (artificial intelligence); LQ-learning agent; POMDP environment; RL algorithm; grid-world problem; labeling Q-learning; learning process; partially observable Markov decision process; reinforcement learning; self-organizing classification; Educational institutions; History; Labeling; Learning systems; Registers; State estimation;

fLanguage

English

Publisher

ieee

Conference_Titel

Computational Cybernetics, 2004. ICCC 2004. Second IEEE International Conference on

Conference_Location

Vienna

Print_ISBN

0-7803-8588-8

Type

conf

DOI

10.1109/ICCCYB.2004.1437741

Filename

1437741