مرکز منطقه ای اطلاع رساني علوم و فناوري - Proposal for an algorithm to improve a rational policy in POMDPs

DocumentCode :

349969

Title :

Proposal for an algorithm to improve a rational policy in POMDPs

Author :

Miyazaki, Kazuteru ; Kobayashi, Shigenobu

Author_Institution :

Int. Grad. Sch. of Sci. & Eng., Tokyo Inst. of Technol., Yokohama, Japan

Volume :

fYear :

1999

fDate :

1999

Firstpage :

492

Abstract :

Reinforcement learning is a kind of machine learning. Partially observable Markov decision process (POMDP) is a representative class of non-Markovian environments in reinforcement learning. The rational policy making (RPM) algorithm learns a deterministic rational policy in POMDPs. Though RPM can learn a policy very quickly, it needs numerous trials to improve the policy. Furthermore, RPM does not apply the class where there is no deterministic rational policy. In this paper, we propose the rational policy improvement (RPI) algorithm that combines RPM and the mark transit algorithm with χ²-goodness-of-fit test. RPI can learn a deterministic or stochastic rational policy in POMDPs. RPI is applied to maze environments. We show that RPI can learn the most stable rational policy in comparison with other methods

Keywords :

Markov processes; decision theory; learning (artificial intelligence); learning systems; observability; machine learning; partially observable Markov decision process; rational policy improvement algorithm; rational policy making algorithm; reinforcement learning; Ear; Economic indicators; Hardware; History; Machine learning algorithms; Proposals; Stochastic processes; Testing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Systems, Man, and Cybernetics, 1999. IEEE SMC '99 Conference Proceedings. 1999 IEEE International Conference on

Conference_Location :

Tokyo

ISSN :

1062-922X

Print_ISBN :

0-7803-5731-0

Type :

conf

DOI :

10.1109/ICSMC.1999.815600

Filename :

815600

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=349969