خوشه بندي مقالات علمي بر پايه الگوريتم k_means مطالعه موردي: پايگاه پژوهشگاه علوم و فناوري اطلاعات ايران(ايرانداك)

عنوان به زبان ديگر

Clustering Scientific Articles based on the K_means Algorithm Case Study: Iranian Research Institute for Information Science and Technology (IranDoc

پديد آورندگان

سليماني نژاد، عادل دانشگاه شهيد باهنر كرمان - بخش علم اطلاعات و دانش شناسي , سالجقه، مژده دانشگاه شهيد باهنر كرمان - بخش علم اطلاعات و دانش شناسي , طيبي نيا، الهام دانشگاه شهيد باهنر كرمان

تعداد صفحه

از صفحه

871

تا صفحه

896

كليدواژه

متــن كاوي , خوشــه بندي , الگوريتــم means_k , معيــار تابــع فاصلــة اقليدســي , پايــگاه ايرانــداك

چكيده فارسي

با رشد روز افزون منابع و مقالات در سطح وب، بكارگيري روش هايي سريع و ارزان براي دسترسي به متون مورد نظر از ميان مجموعه وسيع اين مستندات، اهميت بيشتري مي يابد. براي رسيدن به اين هدف، به كارگيري تكنيك هاي متن كاوي، گامي ارزشمند در جهت كشف دانش از مستندات متني به شمار مي رود. هدف اصلي اين پژوهش خوشه بندي پايگاه پژوهشگاه علوم و فناوري اطلاعات ايران(ايرانداك) براساس فنون متن كاوي مي باشد. تا مقالات موجود به چند خوشه تقسيم شوند بطوريكه مقالات خوشه هاي مختلف حداكثر تفاوت ممكن و مقالات موجود در هر خوشه بيشترين شباهت را با هم داشته باشند . مقالات حوزه هاي مرتبط با فن آوري اطلاعات انتخاب شدند. بدين منظور ابتدا تمام كليد واژه هاي حوزه هاي فن آوري اطلاعات بر اساس دفعات بسامد آنها در مقالات پايگاه انتخاب و سپس مقالات هر كليدواژه از پايگاه ايران داك استخراج گرديد. سپس با استفاده از نرم افزار notepad++ مجموعه داده موردنظر ايجاد گرديد. در اين پژوهش براي انجام خوشه بندي از الگوريتم k_means و از معيار تابع فاصله اقليدسي [1] براي اندازه گيري تشابه خوشه ها استفاده گرديد . سپس نتايج حاصل از خوشه بندي مورد تجزيه و تحليل قرار گرفت تا ميزان شباهت و الگوي مناسب ميان مقالات كشف شد. الگوي مورد نظر نشان داد كه بيشترين ميزان مشابهت ميان مقالات دو خوشه داده كاوي و شبكه عصبي با فاصله اقليدسي 1/365 وجود دارد و كمترين ميزان شباهت ميان مقالات دو خوشه بهينه سازي و پردازش تصوير با فاصله 1/387 گزارش شده است. دانش حاصل از پژوهش، خوشه بندي مقالات مرتبط با بيشترين وكمترين ميزان مشابهت با يكديگر، يافتن الگوي جديد جهت دسترسي سريع و آسان به مقالات مشابه و كشف ارتباط پنهان ميان موضوعات مختلف مي باشند.اين دانش به پژوهشگران كمك مي كند تا بتوانند مقالات موضوعي مرتبط با تخصص خود و مشابه با موضوع مورد مطالعه را به نحوي مطلوب تر شناسايي كنند.

چكيده لاتين

With increasing growth of Web-based resources and articles, the use of quick and inexpensive ways to access the texts from the vast collection of these documents is important. The main objective of this research is to cluster the database of Iranian Research Institute for Information Science and Technology (IranDoc) based on text mining techniques, so that the articles are divided into several clusters and different clusters have maximum possible difference and the articles in each cluster have the most similarity. Articles on information technologyrelated fields were selected. For this purpose, all the keywords of information technology fields were selected first based on their frequencies in database articles and then the articles of each keyword were extracted from the IranDoc database. Then, using notepad ++ software, the dataset was created. In this research, clustering of k_means algorithm and Euclidean distance function criterion were used to measure the similarity of clusters. Then the results of the clustering were analyzed to find the similarity and pattern among the papers. The pattern showed that the greatest similarity is found between articles in two data mining clusters and neural network with an Euclidean distance of 1.365, and the least similarity between two cluster articles is optimization and image processing with a distance of 1.387. Knowledge from this research is to: clustering the articles related to the highest and the least degree of similarity to each other, find a new pattern for quick and easy access to similar articles, and discover hidden relationships between different topics. This knowledge helps researchers to better identify the subject-related articles related to their subject matter, which are similar to the subject matter studied.

سال انتشار

1397

عنوان نشريه

پژوهش نامه پردازش و مديريت اطلاعات

فايل PDF

7583490

عنوان نشريه

پژوهش نامه پردازش و مديريت اطلاعات

لينک به اين مدرک

https://search.isc.ac/dl/search/defaultta.aspx?DTC=8&DC=1054642