خوشه بندي اسناد وب با استفاده از روش فازي آنتولوژي محور

عنوان به زبان ديگر

Clustering Web Documents Using Ontology-Based Fuzzy Method

پديدآورندگان

سخايي نجمه n.sakhaee.star@gmail.com دانشگاه آزاد اسلامي واحد كرج , صالحي فريبا Fariba.salehi@kiau.ac.ir دانشگاه آزاد اسلامي واحد كرج , خليليان مجيد Khalilian@kiau.ac.ir دانشگاه آزاد اسلامي واحد كرج

تعداد صفحه

كليدواژه

خوشه بندي , اسناد وب , كاوش , وب معنايي

سال انتشار

1397

عنوان كنفرانس

چهارمين كنفرانس بين المللي وب پژوهي

زبان مدرك

فارسي

چكيده فارسي

اسناد و صفحات وب در اينترنت به‌سرعت در حال گسترش هستند. موتورهاي جستجو و خدمت رسان‌هاي وب براي يافتن صفحات وب و اسناد موردنظر در ميان حجم انبوهي از اسناد، از روش‌هاي مختلف استفاده مي‌كنند. با اين وجود سازمان‌دهي و تحليل حجم وسيعي از داده‌ها چالش‌برانگيز است. مشكل مطرح درزمينهٔ بازيابي صفحات وب، اين است كه اطلاعات موجود در وب وسيع جهاني در فرمت‌هاي مختلف و از منابع مختلف مي‌باشند. صحت انتخاب داده‌ها ضروري بوده و تطابق آنها با درخواست كاربران به‌عنوان چالشي در كاوش وب مي باشد. به‌منظور ارائه راه‌حلي بهينه براي كاوش در ميان اسناد وب و سازمان‌دهي و دسترسي سريع و صحيح به اسناد و صفحات وب ساخت‌يافته و نيمه ساخت‌يافته در اين تحقيق روشي جديد پيشنهاد شده است. روش پيشنهادي بر اساس خوشه‌بندي و فازي سازي اسناد وب و با توجه به معنا و ساختار صفحات وب مي باشد. در روش پيشنهادي براي كاهش بعد يا ويژگي‌ها، نگاشت ويژگي‌ها به حوزه‌هاي معنايي پيشنهاد شده است. نتايج حاصل از پياده سازي روش پيشنهادي در نرم افزار پايتون و متلب نشان مي دهد روش پيشنهادي در دسته بندي و سازماندهي اسناد وب، از نظر كيفيت خوشه ها و تراكم آنها مناسب بوده و از نظر شاخص ديويس بولدين و سيلهوئت داراي مقادير مناسبي مي باشد.

چكيده لاتين

Web documents and web pages are expanding rapidly. Web search engines and web services use different methods to find web pages and documents in the massive amount of documents. However, organizing and analyzing a large amount of data is challenging. The problem with web page retrieval is that the information on the global web is in different formats and from different sources. The accuracy of data selection is essential and their compliance with user requests is a challenge in exploring the web. In order to provide an optimal solution for exploring web documents and organizing and providing quick and accurate access to structured and semi-structured Web documents and web pages, a new approach is proposed. The proposed method is based on the clustering and Web document fuzzation and the semantic and structure of web pages. In the proposed method for the reduction of dimension or features, the mapping of attributes to semantic domains is proposed. The results of the implementation of the proposed method in Python and MATLAB software show that the proposed method in categorizing and organizing web documents is appropriate for the quality of clusters and their density, and in the terms of the davies bouldin and silhouette index, they have suitable values.

كشور

ايران

لينک به اين مدرک

https://search.isc.ac/dl/search/defaultta.aspx?DTC=36&DC=314918