عنوان مقاله :
راﻧﻨﺪﮔﯽ ﺧﻮدﮐﺎر در ﻣﺤﯿﻂ ﺑﺰرﮔﺮاه ﻣﺒﺘﻨﯽ ﺑﺮ ﯾﺎدﮔﯿﺮي ﺳﯿﺎﺳﺖ ﺑﺎ اﺳﺘﻔﺎده از روش ﻫﺎي ﯾﺎدﮔﯿﺮي ﺗﻘﻮﯾﺘﯽ ﺗﻮزﯾﻌﯽ
عنوان به زبان ديگر :
Policy-based Auto-Driving in Highway based on Distributional Reinforcement Learning Methods
پديد آورندگان :
ﻣﻼﺋﯽ، ﻣﻬﺪي داﻧﺸﮕﺎه ﻋﻠﻢ و ﺻﻨﻌﺖ اﯾﺮان - داﻧﺸﮑﺪه ﻣﻬﻨﺪﺳﯽ ﺧﻮدرو، ﺗﻬﺮان، اﯾﺮان , اﻣﯿﺮﺧﺎﻧﯽ، ﻋﺒﺪاﷲ داﻧﺸﮕﺎه ﻋﻠﻢ و ﺻﻨﻌﺖ اﯾﺮان - داﻧﺸﮑﺪه ﻣﻬﻨﺪﺳﯽ ﺧﻮدرو، ﺗﻬﺮان، اﯾﺮان
كليدواژه :
ﯾﺎدﮔﯿﺮي ﺗﻘﻮﯾﺘﯽ ﺗﻮزﯾﻌﯽ , ﺧﻮدرو ﺧﻮدران , ﺳﯿﺴﺘﻢ ﻫﺎي ﮐﻤﮏ راﻧﻨﺪه
چكيده فارسي :
ﭼﮑﯿﺪه: اﯾﻦ ﻣﻘﺎﻟﻪ ﺑﻪ اراﺋﻪ ﯾﮏ روش ﯾﺎدﮔﯿﺮي ﻣﺒﺘﻨﯽ ﺑﺮ ﯾﺎدﮔﯿﺮي ﺗﻘﻮﯾﺘﯽ ﺟﻬﺖ ﻃﺮاﺣﯽ ﯾﮏ ﻧﺎﻇﺮ ﺑﻪ ﻣﻨﻈﻮر راﻧﻨﺪﮔﯽ ﺧﻮدﮐﺎر در ﻣﺤﯿﻂ ﺑﺰرﮔﺮاه ﻣﯽ ﭘﺮدازد. ﺑﺎ ﺗﻮﺟﻪ ﺑﻪ ﺗﺼﺎدﻓﯽ ﺑﻮدن ﺷﺮاﯾﻂ راﻧﻨﺪﮔﯽ در ﺑﺰرﮔﺮاه و ﻫﻤﭽﻨﯿﻦ درﻧﻈﺮ ﮔﺮﻓﺘﻦ ﺷﺮاﯾﻂ واﻗﻌﯽ ﺗﺮ راﻧﻨﺪﮔﯽ، از ﻣﺰاﯾﺎي ﯾﺎدﮔﯿﺮي ﺗﻘﻮﯾﺘﯽ ﺗﻮزﯾﻌﯽ ﻋﻤﯿﻖ ﺑﻬﺮه ﮔﺮﻓﺘﻪ ﺷﺪه اﺳﺖ. در اﯾﻦ ﻣﻘﺎﻟﻪ ﺑﺮاي اوﻟﯿﻦ ﺑﺎر ﺟﻬﺖ ﯾﺎدﮔﯿﺮي ﺳﯿﺎﺳﺖ ﻫﺎي راﻧﻨﺪﮔﯽ اﺳﺘﻔﺎده از روش ﻫﺎي ﯾﺎدﮔﯿﺮي ﺗﻘﻮﯾﺘﯽ ﺗﻮزﯾﻌﯽ ﺗﺎﺑﻊ ﮐﻤﯽ ﺗﻤﺎم ﭘﺎراﻣﺘﺮي ﺷﺪه FQF)( و ﺷﺒﮑﻪ ﮐﻤﯽ ﺿﻤﻨﯽ IQN)( ﭘﯿﺸﻨﻬﺎد ﺷﺪه اﺳﺖ. ﺑﺮاي آﻣﻮزش ﻋﺎﻣﻞ، اﺳﺘﻔﺎده از داده ﻫﺎي دورﺑﯿﻦ، ﻟﯿﺪار و ﺗﺮﮐﯿﺐ آن دو ﭘﯿﺸﻨﻬﺎد ﺷﺪه اﺳﺖ. ﺑﻪ ﻣﻨﻈﻮر اﺳﺘﻔﺎده از ﺗﺮﮐﯿﺐ دو ﻧﻮع داده، ﺳﺎﺧﺘﺎر ﺷﺒﮑﻪ ﭼﻨﺪ ورودي را ﺑﻪ ﺧﺪﻣﺖ ﮔﺮﻓﺘﻪ اﯾﻢ. ﻫﻤﭽﻨﯿﻦ ﺑﺮاي ارزﯾﺎﺑﯽ روش ﻫﺎي ﭘﯿﺸﻨﻬﺎد ﺷﺪه، از ﺷﺒﯿﻪ ﺳﺎز راﻧﻨﺪﮔﯽ در ﺑﺰرﮔﺮاه ﮐﻪ در ﻧﺮم اﻓﺰار Unity ﺗﻮﺳﻌﻪ ﯾﺎﻓﺘﻪ اﺳﺖ، اﺳﺘﻔﺎده ﺷﺪه اﺳﺖ. ﺗﺤﻘﻖ ﺧﻮدرو ﺧﻮدران در ﺷﺒﯿﻪ ﺳﺎز ﻣﻮرد ﻧﻈﺮ ﺑﻪ ﮐﻤﮏ ﺳﯿﺴﺘﻢ ﻫﺎي ﮐﻤﮏ راﻧﻨﺪه ﺻﻮرت ﭘﺬﯾﺮﻓﺘﻪ اﺳﺖ. اﻓﺰون ﺑﺮ اﯾﻦ، ارزﯾﺎﺑﯽ ﻋﺎﻣﻞ ﺑﺮاﺳﺎس ﯾﺎدﮔﯿﺮي ﺳﯿﺎﺳﺖ راﻧﻨﺪﮔﯽ ﮐﻪ ﻗﺎدر ﺑﻪ اﻧﺘﺨﺎب ﻋﻤﻞ ﺻﺤﯿﺢ ﺑﺮاي ﻫﺪاﯾﺖ ﺧﻮدور ﺑﺎﺷﺪ ﻧﯿﺰ اﻧﺠﺎم ﺷﺪه اﺳﺖ. در راﺳﺘﺎي ارزﯾﺎﺑﯽ ﺑﻬﺘﺮ روش ﻫﺎ دو ﻣﻌﯿﺎر ﺗﻐﯿﯿﺮات ﺳﺮﻋﺖ و ﺗﻐﯿﯿﺮات ﻻﯾﻦ را ﺑﺮاي ﯾﺎدﮔﯿﺮي ﺳﯿﺎﺳﺖ راﻧﻨﺪﮔﯽ ﺑﺮرﺳﯽ ﮐﺮده اﯾﻢ. ﻧﺘﺎﯾﺞ ﺑﺪﺳﺖ آﻣﺪه از ﻣﻘﺎﻟﻪ ﺑﺎ روش ﻫﺎﯾﯽ ﻧﻈﯿﺮ ﺷﺒﮑﻪ Q ﻋﻤﯿﻖ )DQN(، ﺷﺒﮑﻪ Q ﻋﻤﯿﻖ رﮔﺮﺳﯿﻮن ﮐﻤﯽ )QR-DQN( ﮐﻪ ﭘﯿﺶ ﺗﺮ اراﺋﻪ ﺷﺪه ﺑﻮد ﻣﻘﺎﯾﺴﻪ ﮔﺮدﯾﺪ. ﻧﺘﺎﯾﺞ ﺑﺪﺳﺖ آﻣﺪه ﻧﺸﺎن دﻫﻨﺪه آن اﺳﺖ ﮐﻪ اﻟﮕﻮرﯾﺘﻢ ﻫﺎي ﭘﯿﺸﻨﻬﺎدي ﺗﻮاﻧﺎﯾﯽ ﯾﺎدﮔﯿﺮي ﺳﯿﺎﺳﺖ ﻫﺎي ﻣﻨﺎﺳﺐ راﻧﻨﺪﮔﯽ در ﻣﺤﯿﻂ ﺑﺰرﮔﺮاه را دارﻧﺪ. ﻫﻤﭽﻨﯿﻦ روش FQF ﻋﻤﻠﮑﺮد ﺑﻬﺘﺮي ﻧﺴﺒﺖ ﺑﻪ IQN و ﺳﺎﯾﺮ روش ﻫﺎﯾﯽ ﮐﻪ در ﮔﺬﺷﺘﻪ ﭘﯿﺎده ﺳﺎزي ﺷﺪه اﻧﺪ از ﺧﻮد ﻧﺸﺎن ﻣﯽ دﻫﺪ.
چكيده لاتين :
This paper presents reinforcement learning-based learning methods for designing a supervisor for automatic driving in the highway environment. Due to the random driving conditions on the highway as well as the more realistic driving conditions, the benefits of deep distributed reinforcement learning have been exploited. In this paper, for the first time, the use of Fully Parameterized Quantile Function (FQF) and Implicit Quantile Network (IQN) distributed learning methods is proposed to learn driving policies. To train the agent using the camera data, the LIDAR sensor and its combination are suggested. In order to use the combination of the two types of data, we have employed a multi-input network structure. To evaluate the proposed methods, we have used the highway driving simulator developed in unity software. The realization of the car in the simulator is done with the help of driver assistance systems. Agent evaluation is based on a learning driving policy that can choose the right action to steer the car. In order to better evaluate the methods, we have examined the two criteria of speed changes and lane changes for learning driving policy. The results obtained from the article were compared with methods such as DQN, QR-DQN that were previously presented. The results show that the proposed algorithms can learn appropriate driving policies in the highway environment. The FQF method also performs better than IQN and other strategies implemented in the past.
عنوان نشريه :
مهندسي برق و الكترونيك ايران