Title :
Proposal for an algorithm to improve a rational policy in POMDPs
Author :
Miyazaki, Kazuteru ; Kobayashi, Shigenobu
Author_Institution :
Int. Grad. Sch. of Sci. & Eng., Tokyo Inst. of Technol., Yokohama, Japan
Abstract :
Reinforcement learning is a kind of machine learning. Partially observable Markov decision process (POMDP) is a representative class of non-Markovian environments in reinforcement learning. The rational policy making (RPM) algorithm learns a deterministic rational policy in POMDPs. Though RPM can learn a policy very quickly, it needs numerous trials to improve the policy. Furthermore, RPM does not apply the class where there is no deterministic rational policy. In this paper, we propose the rational policy improvement (RPI) algorithm that combines RPM and the mark transit algorithm with χ2-goodness-of-fit test. RPI can learn a deterministic or stochastic rational policy in POMDPs. RPI is applied to maze environments. We show that RPI can learn the most stable rational policy in comparison with other methods
Keywords :
Markov processes; decision theory; learning (artificial intelligence); learning systems; observability; machine learning; partially observable Markov decision process; rational policy improvement algorithm; rational policy making algorithm; reinforcement learning; Ear; Economic indicators; Hardware; History; Machine learning algorithms; Proposals; Stochastic processes; Testing;
Conference_Titel :
Systems, Man, and Cybernetics, 1999. IEEE SMC '99 Conference Proceedings. 1999 IEEE International Conference on
Conference_Location :
Tokyo
Print_ISBN :
0-7803-5731-0
DOI :
10.1109/ICSMC.1999.815600