DocumentCode :
2043949
Title :
Exploitation/exploration learning for MDP environment
Author :
Iwata, Kazunori ; Ito, Nobuhiro ; Yarnauchi, K. ; Ishii, Naohiro
Author_Institution :
Dept. of Intelligence & Comput. Sci., Nagoya Inst. of Technol., Japan
Volume :
1
fYear :
2000
fDate :
2000
Firstpage :
149
Abstract :
Reinforcement learning is an effective learning in unknown environment, where a supervisor cannot support the learner. The agent needs a large number of trial-and-error interactions to find optimal behaviors. This leads to a serious problem if the agent is in a dynamic environment, because the agent cannot adapt to the new changed environment quickly. To overcome the drawback, we propose a new reinforcement learning method for quick adaptation. In the new method, the agent maintains both an exploitation (EI) strategy and an exploration (ER) strategy alternately for each state. In the El strategy, the agent tries to select the best action using the past memory of the agent. While in the ER strategy, the agent tries to identify the environment using an estimation of error and search new states from an unknown region in the state space. Using these two strategies, the agent can reduce the redundant searching in the state space. Experimental results show the agent yields a quick adaptation to unknown environments
Keywords :
Markov processes; decision theory; learning (artificial intelligence); optimisation; software agents; Markov decision process environments; exploitation learning; exploration learning; optimisation; reinforcement learning; software agents; upper bound; Acceleration; Computer science; Delay effects; Erbium; Estimation error; Learning; State estimation; State-space methods; Switches;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Industrial Electronics Society, 2000. IECON 2000. 26th Annual Confjerence of the IEEE
Conference_Location :
Nagoya
Print_ISBN :
0-7803-6456-2
Type :
conf
DOI :
10.1109/IECON.2000.973141
Filename :
973141
Link To Document :
بازگشت