تعيين مشابهت معنايي به روش بدون‌سرپرست با استفاده از قدم‌زني تصادفي بر گراف جايگزيني زباني

عنوان به زبان ديگر

Unsupervised Semantic Similarity Estimation using Random Walk on Lexical Substitution Graph

پديد آورندگان

كاوه يزدي، فاطمه دانشگاه يزد - گروه مهندسي كامپيوتر , زارع بيدكي، علي محمد دانشگاه يزد - گروه مهندسي كامپيوتر , پژوهان، محمدرضا دانشگاه يزد - گروه مهندسي كامپيوتر

تعداد صفحه

از صفحه

237

تا صفحه

249

كليدواژه

مشابهت معنايي , جايگزيني زباني , گراف جايگزيني , قدم‌زني تصادفي , پيكره , ويكي‌پديا

چكيده فارسي

اين مقاله به معرفي روشي براي تعيين مشابهت معنايي كلمات با استفاده از پيكره‌هاي تنك مي‌پردازد. اين روش با ارائه مفهوم جايگزين‌پذيري غيرمستقيم براي اولين بار و پياده‌سازي آن از طريق گراف جايگزين‌پذيري عبارت‌ها توانسته است بر مشكل تنك بودن فضاي زمينه در زبان‌هاي با منابع محدودتر مانند فارسي غلبه نمايد. از طرف ديگر بايد به اين نكته اشاره نمود كه براي توليد گراف جايگزيني لازم براي تعيين مشابهت معنايي مي‌توان از پيكره‌هاي متني به صورت مستقل از زبان بهره گرفت. نتايج ارزيابي‌ها با استفاده از دادگان آزمون مجموعه RG-65 كه از دادگان متداول براي ارزيابي كيفيت تعيين مشابهت معنايي است، نشان مي‌دهد كه مقدار ضريب همبستگي Spearman اين روش بين 0.03 تا 0.13 واحد بيش از ساير روش‌هاي بدون سرپرست موفق است.

چكيده لاتين

This paper introduces the indirect substitutability relation for the first time to provide a practical solution for estimating semantic similarity. Proposed method is an unsupervised semantic similarity estimation method, which is benefitted from taking into account the indirect substitutability relation. This method recognizes the substitutability between two terms by considering a third term, which has similar lexical context with each of them separately. To model this relation, we generate a graph using substitutable pairs of terms. The strength of the relation between each pair of terms is approximated by propagating semantic score through the substitutability graph. This method is language independent and uses only textual corpora to generate the substitution graph. Furthermore, it supports semantic similarity estimation in languages suffering from lack of dense corpora. Results of our experiments using RG-65 Persian dataset show that the proposed method outperforms the baseline algorithms. The proposed method improves the estimation from 0.03 Spearman's correlation up to 0.13 in comparison with the baseline algorithms.

سال انتشار

1397

عنوان نشريه

مهندسي برق دانشگاه تبريز

فايل PDF

7441290

عنوان نشريه

مهندسي برق دانشگاه تبريز

لينک به اين مدرک

https://search.isc.ac/dl/search/defaultta.aspx?DTC=8&DC=1003933