شماره ركورد كنفرانس :
3704
عنوان مقاله :
استفاده از يادگيري تقويتي عميق براي مساله Grid World با موانع متحرك و بررسي تاثير ميزان نرخ يادگيري بر پاداش دريافتي
عنوان به زبان ديگر :
Deep Reinforcement Learning for Grid world with changing obstacles and investigating the effect of learning rate on received rewards
پديدآورندگان :
Aliae Torghabe Mohammad Hasan mh.olyaei@yahoo.com Sadjad University of Technology , jalali Hosein h.j1388@yahoo.com Sadjad University of Technology , Aliae Torghabe Ali ali110vfx@yahoo.com Ferdowsi University of Mashhad , Noori Amin amin.noori@gmail.com Sadjad University of Technology
كليدواژه :
يادگيري تقويتي عميق , يادگيري عميق , Grid World , مانع متحرك , پايتون
عنوان كنفرانس :
پنجمين كنفرانس بين المللي در مهندسي برق و كامپيوتر با تاكيد بر دانش بومي
چكيده فارسي :
This paper discusses solving the Grid World with Changing Obstacles (GWCO) problem with the Deep Reinforcement Learning (Deep RL) method. In the GWCO problem, obstacles move on specific paths. Moving these obstacles turns this problem into a dynamic problem. Due to the changing environment of the problem and the high number of state-action, the Deep RL method is used to solve the GWCO problem. In this paper, we refer to the methods of Reinforcement Learning, Deep Learning, and Deep RL, and some of their applications. In the final section, by comparing the three types of learning rate (α), the simulation results are compared and it can be concluded that for the GWCO problem, the learning rate is better set to 0.001. Simulation of this paper is done with the powerful Python software and Tensorflow.
چكيده لاتين :
This paper discusses solving the Grid World with Changing Obstacles (GWCO) problem with the Deep Reinforcement Learning (Deep RL) method. In the GWCO problem, obstacles move on specific paths. Moving these obstacles turns this problem into a dynamic problem. Due to the changing environment of the problem and the high number of state-action, the Deep RL method is used to solve the GWCO problem. In this paper, we refer to the methods of Reinforcement Learning, Deep Learning, and Deep RL, and some of their applications. In the final section, by comparing the three types of learning rate (α), the simulation results are compared and it can be concluded that for the GWCO problem, the learning rate is better set to 0.001. Simulation of this paper is done with the powerful Python software and Tensorflow.