مرکز منطقه ای اطلاع رساني علوم و فناوري - Exploitation/exploration learning for MDP environment

DocumentCode :

2043949

Title :

Exploitation/exploration learning for MDP environment

Author :

Iwata, Kazunori ; Ito, Nobuhiro ; Yarnauchi, K. ; Ishii, Naohiro

Author_Institution :

Dept. of Intelligence & Comput. Sci., Nagoya Inst. of Technol., Japan

Volume :

fYear :

2000

fDate :

2000

Firstpage :

149

Abstract :

Reinforcement learning is an effective learning in unknown environment, where a supervisor cannot support the learner. The agent needs a large number of trial-and-error interactions to find optimal behaviors. This leads to a serious problem if the agent is in a dynamic environment, because the agent cannot adapt to the new changed environment quickly. To overcome the drawback, we propose a new reinforcement learning method for quick adaptation. In the new method, the agent maintains both an exploitation (EI) strategy and an exploration (ER) strategy alternately for each state. In the El strategy, the agent tries to select the best action using the past memory of the agent. While in the ER strategy, the agent tries to identify the environment using an estimation of error and search new states from an unknown region in the state space. Using these two strategies, the agent can reduce the redundant searching in the state space. Experimental results show the agent yields a quick adaptation to unknown environments

Keywords :

Markov processes; decision theory; learning (artificial intelligence); optimisation; software agents; Markov decision process environments; exploitation learning; exploration learning; optimisation; reinforcement learning; software agents; upper bound; Acceleration; Computer science; Delay effects; Erbium; Estimation error; Learning; State estimation; State-space methods; Switches;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Industrial Electronics Society, 2000. IECON 2000. 26th Annual Confjerence of the IEEE

Conference_Location :

Nagoya

Print_ISBN :

0-7803-6456-2

Type :

conf

DOI :

10.1109/IECON.2000.973141

Filename :

973141

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2043949