Title :
Comparisons of keyphrase extraction methods in source retrieval of plagiarism detection
Author :
Hui Ning; Leilei Kong; Mingxing Wang; Cuixia Du; Haoliang Qi
Author_Institution :
College of Computer Science and Technology, Harbin Engineering University, China
Abstract :
In the processing of source retrieval in plagiarism detection, rationale for keywords extraction is to select only those phrases or words which maximize the chance of retrieving source documents matching the suspicious document. TF-IDF (term frequency-inverse document frequency), weighted TF-IDF (the weighted term frequency-inverse document frequency, namely, the TF-IDF of a term with a different coefficient in different positions), TF-IDF based on passages and Weighted TF-IDF based on passages have been used as keywords extraction methods in source retrieval of plagiarism detection in several previous researches. According to the previous researches, TF-IDF based on full document and weighted TF-IDF could get the higher performance. However, our experiments show that the same keywords extraction method for different types of plagiarism can get the different retrieval results and the different methods for the same type of plagiarism could achieve the significantly different results. In this study, we carry out more experiments on the above methods. All comparisons experiments are implemented by using vector space model. Experimental results show that TF-IDF based on passages is the best choice.
Keywords :
"Plagiarism","Data mining","Computer science","Analytical models","Data models","Documentation","Algorithm design and analysis"
Conference_Titel :
Computer Science and Network Technology (ICCSNT), 2015 4th International Conference on
DOI :
10.1109/ICCSNT.2015.7490831