شماره ركورد كنفرانس :
4028
عنوان مقاله :
Topic Modeling Based Plagiarism Detection in Persian Documents
پديدآورندگان :
Mohammadian Banafshe Kharazmi University , Khakpour Jafar Kharazmi University , Pedram Mir Mohsen pedram@khu.ac.ir Kharazmi University
كليدواژه :
Plagiarism Detection , Topic Modeling , Latent Dirichlet Allocation , Singular Value Decomposition
عنوان كنفرانس :
هشتمين همايش ملي سمينار آمار و احتمال فازي
چكيده فارسي :
Plagiarism is a rapidly growing problem in the advent of internet and information
explosion. Automatic techniques for detecting plagiarized documents are crucially
important. There are many machine learning algorithms suggested to automatically
detect plagiarized documents in a large corpus of documents. This research aims
to use machine learning algorithms such as vector space model (VSM) and singular
value decomposition (SVD) algorithms for this purpose. This paper uses vector space
model to represent corpus of text data, and SVD is used to analyze latent semantics
of these documents. Natural language processing techniques are also used to process
and clean documents, and documents are clustered by topic modeling technique to
enhance its accuracy. Experimental results show the effectiveness of the proposed
method.