Evaluating Semantic and Syntactic Similarity for Plagiarism Detection in English Using NLP

پديدآورندگان

Khajeh Zadeh Mahsa mahsakhz@gmail.com Department of English Language Teaching, Ahvaz Branch, Islamic Azad University, Ahvaz, Iran , Zaifar Meisam Department of English Language Teaching, Ahvaz Branch, Islamic Azad University, Ahvaz, Iran

تعداد صفحه

كليدواژه

Semantic Similarity , Syntactic Similarity , Plagiarism , NLP

سال انتشار

1402

عنوان كنفرانس

دومين كنفرانس ملي تحول ديجيتال و سيستم هاي هوشمند

زبان مدرك

انگليسي

چكيده فارسي

Manually detecting plagiarism in the huge volume of published documents is not feasible. Existing automatic plagiarism detection tools mostly focus on lexical matching, missing semantic and syntactic aspects of plagiarism. A challenging area of plagiarism detection is the semantic area which is the combination of lexical and syntactic conversions. NLP can be exploited to analyze the semantic similarity and detect document plagiarism. Hybrid methods, made by a combination of different kinds of algorithms, have proven to be more comprehensive. In this study an existing hybrid similarity algorithm is improved and a plagiarism detection method and plagiarism score is defined to compare document plagiarism levels. The results on MASRP dataset show a few percent improvement in all similarity evaluation criteria, including accuracy, precision, recall and F-measure. Moreover, the document plagiarism score shows a good reflection of the amount of plagiarism detected in the documents. Our tests on CPSA corpus verify that the defined plagiarism score correlates to the level of plagiarism in the suspicious document.

كشور

ايران

لينک به اين مدرک

https://search.isc.ac/dl/search/defaultta.aspx?DTC=36&DC=356661