Performance of LQ-learning in POMDP environments

Author

Lee, Haeyeon ; Kama, Hiroyuki ; Abe, Kenich

Author_Institution

Sch. of Eng., Tohoku Univ., Sendai, Japan

Volume

fYear

2002

fDate

5-7 Aug. 2002

Firstpage

819

Abstract

In this paper, we propose a new type of LQ-learning to solve POMDP. In the POMDP environment, the agent cannot observe the environment directly. In the LQ-learning, in order to discriminate partially observed states, the agent attaches label to each observation which perceived as the same ones. Unlike our previous LQ-learning, we make preparations of knowledge about the environment in advance. The knowledge is automatically acquired by Kohenen´s Self-Organizing Map (SOM), which provides the knowledge about state transitions to the agent. Then, LQ-learning agent attaches labels to observations with reference to a map obtained by SOM.

Keywords

Markov processes; learning (artificial intelligence); self-organising feature maps; Kohenen self-organizing map; LQ-learning performance; POMDP environments; partially observed Markov decision processes; reinforcement learning; Educational institutions; Feedback; History; Labeling; Learning; Leg; Organizing; Registers; Signal processing; State estimation;

fLanguage

English

Publisher

ieee

Conference_Titel

SICE 2002. Proceedings of the 41st SICE Annual Conference

Print_ISBN

0-7803-7631-5

Type

conf

DOI

10.1109/SICE.2002.1195263

Filename

1195263

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=393477