شماره ركورد كنفرانس :
4028
عنوان مقاله :
Topic Modeling Based Plagiarism Detection in Persian Documents
پديدآورندگان :
Mohammadian Banafshe Kharazmi University , Khakpour Jafar Kharazmi University , Pedram Mir Mohsen pedram@khu.ac.ir Kharazmi University
تعداد صفحه :
7
كليدواژه :
Plagiarism Detection , Topic Modeling , Latent Dirichlet Allocation , Singular Value Decomposition
سال انتشار :
1397
عنوان كنفرانس :
هشتمين همايش ملي سمينار آمار و احتمال فازي
زبان مدرك :
انگليسي
چكيده فارسي :
Plagiarism is a rapidly growing problem in the advent of internet and information explosion. Automatic techniques for detecting plagiarized documents are crucially important. There are many machine learning algorithms suggested to automatically detect plagiarized documents in a large corpus of documents. This research aims to use machine learning algorithms such as vector space model (VSM) and singular value decomposition (SVD) algorithms for this purpose. This paper uses vector space model to represent corpus of text data, and SVD is used to analyze latent semantics of these documents. Natural language processing techniques are also used to process and clean documents, and documents are clustered by topic modeling technique to enhance its accuracy. Experimental results show the effectiveness of the proposed method.
كشور :
ايران
لينک به اين مدرک :
بازگشت