DocumentCode :
1749204
Title :
Coordination of exploration and exploitation in a dynamic environment
Author :
Yen, Gary ; Yang, Fengming ; Hickey, Travis ; Goldstein, Michel
Author_Institution :
Sch. of Electr. & Comput. Eng., Oklahoma State Univ., Stillwater, OK, USA
Volume :
2
fYear :
2001
fDate :
2001
Firstpage :
1014
Abstract :
One much researched issue in reinforcement learning is the trade off between exploration and exploitation. Being able to effectively balance exploration and exploitation activities becomes even more crucial in a dynamic environment. An algorithm is proposed herein that provides one solution to the exploration vs. exploitation dilemma. The algorithm is presented in the context of a path-finding agent in a dynamic grid-world problem. The state-value function used is penalty based, allowing the agent to act over the space of paths with minimal penalties. A forgetting mechanism is implemented that allows the agent to explore paths that were previously determined to be suboptimal. Simulation results are used to analyze the behavior of the proposed algorithm in a dynamic environment
Keywords :
learning (artificial intelligence); software agents; Q learning; dynamic grid-world problem; exploitation; exploration; forgetting mechanism; path-finding agent; penalty; reinforcement learning; state-value function; Acceleration; Algorithm design and analysis; Analytical models; Control systems; Genetic algorithms; Heuristic algorithms; Intelligent control; Intelligent systems; Learning systems; Systems engineering and theory;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks, 2001. Proceedings. IJCNN '01. International Joint Conference on
Conference_Location :
Washington, DC
ISSN :
1098-7576
Print_ISBN :
0-7803-7044-9
Type :
conf
DOI :
10.1109/IJCNN.2001.939499
Filename :
939499
Link To Document :
بازگشت