Adaptive Profit Sharing Reinforcement Learning Method for Dynamic Environment

Author

Koujaku, Sadamori ; Watanabe, Kota ; Igarashi, Hajime

Author_Institution

Grad. Sch. of Inf. Sci. Technol., Hokkaido Univ., Sapporo, Japan

Volume

1

fYear

2011

fDate

18-21 Dec. 2011

Firstpage

462

Lastpage

465

Abstract

In this paper, an Adaptive Forgettable Profit Sharing reinforcement learning method is introduced. This method enables agents to adapt the environmental changes very quickly. It can be used to learn the robust and effective actions in the uncertain environments which have the non-Markov property, especially the partial observable Markov process (POMDP). Profit Sharing learns rational policy that is easy to be learned and results in good behavior in POMDP. However, the policy becomes worse in the dynamic and huge environment that changes frequently and require the lots of actions to achieve the goal. In order to handle such kind of environment, the forgetting, which gives the adaptability and rationality to Profit Sharing, is implemented. This method allows the agent to forget past experiences that reduce the rationality of its policy. The usefulness of the proposed algorithm is demonstrated through the numerical examples.

Keywords

Markov processes; incentive schemes; learning (artificial intelligence); adaptive forgettable profit sharing reinforcement learning method; dynamic environment; nonMarkov property; partial observable Markov process; Educational institutions; Information science; Learning; Learning systems; Markov processes; Robustness; Reinforcement Learning; forgetting; rational theorem;

fLanguage

English

Publisher

ieee

Conference_Titel

Machine Learning and Applications and Workshops (ICMLA), 2011 10th International Conference on

Conference_Location

Honolulu, HI

Print_ISBN

978-1-4577-2134-2

Type

conf

DOI

10.1109/ICMLA.2011.25

Filename

6147020