Title :
Coordination of exploration and exploitation in a dynamic environment
Author :
Yen, Gary ; Yang, Fengming ; Hickey, Travis ; Goldstein, Michel
Author_Institution :
Sch. of Electr. & Comput. Eng., Oklahoma State Univ., Stillwater, OK, USA
Abstract :
One much researched issue in reinforcement learning is the trade off between exploration and exploitation. Being able to effectively balance exploration and exploitation activities becomes even more crucial in a dynamic environment. An algorithm is proposed herein that provides one solution to the exploration vs. exploitation dilemma. The algorithm is presented in the context of a path-finding agent in a dynamic grid-world problem. The state-value function used is penalty based, allowing the agent to act over the space of paths with minimal penalties. A forgetting mechanism is implemented that allows the agent to explore paths that were previously determined to be suboptimal. Simulation results are used to analyze the behavior of the proposed algorithm in a dynamic environment
Keywords :
learning (artificial intelligence); software agents; Q learning; dynamic grid-world problem; exploitation; exploration; forgetting mechanism; path-finding agent; penalty; reinforcement learning; state-value function; Acceleration; Algorithm design and analysis; Analytical models; Control systems; Genetic algorithms; Heuristic algorithms; Intelligent control; Intelligent systems; Learning systems; Systems engineering and theory;
Conference_Titel :
Neural Networks, 2001. Proceedings. IJCNN '01. International Joint Conference on
Conference_Location :
Washington, DC
Print_ISBN :
0-7803-7044-9
DOI :
10.1109/IJCNN.2001.939499