Title :
Exploitation/exploration learning for MDP environment
Author :
Iwata, Kazunori ; Ito, Nobuhiro ; Yarnauchi, K. ; Ishii, Naohiro
Author_Institution :
Dept. of Intelligence & Comput. Sci., Nagoya Inst. of Technol., Japan
Abstract :
Reinforcement learning is an effective learning in unknown environment, where a supervisor cannot support the learner. The agent needs a large number of trial-and-error interactions to find optimal behaviors. This leads to a serious problem if the agent is in a dynamic environment, because the agent cannot adapt to the new changed environment quickly. To overcome the drawback, we propose a new reinforcement learning method for quick adaptation. In the new method, the agent maintains both an exploitation (EI) strategy and an exploration (ER) strategy alternately for each state. In the El strategy, the agent tries to select the best action using the past memory of the agent. While in the ER strategy, the agent tries to identify the environment using an estimation of error and search new states from an unknown region in the state space. Using these two strategies, the agent can reduce the redundant searching in the state space. Experimental results show the agent yields a quick adaptation to unknown environments
Keywords :
Markov processes; decision theory; learning (artificial intelligence); optimisation; software agents; Markov decision process environments; exploitation learning; exploration learning; optimisation; reinforcement learning; software agents; upper bound; Acceleration; Computer science; Delay effects; Erbium; Estimation error; Learning; State estimation; State-space methods; Switches;
Conference_Titel :
Industrial Electronics Society, 2000. IECON 2000. 26th Annual Confjerence of the IEEE
Conference_Location :
Nagoya
Print_ISBN :
0-7803-6456-2
DOI :
10.1109/IECON.2000.973141