DocumentCode :
168299
Title :
Reducing computational effort for plagiarism detection by using citation characteristics to limit retrieval space
Author :
Meuschke, Norman ; Gipp, Bela
Author_Institution :
Nat. Inst. of Inf., Tokyo, Japan
fYear :
2014
fDate :
8-12 Sept. 2014
Firstpage :
197
Lastpage :
200
Abstract :
This paper proposes a hybrid approach to plagiarism detection in academic documents that integrates detection methods using citations, semantic argument structure, and semantic word similarity with character-based methods to achieve a higher detection performance for disguised plagiarism forms. Currently available software for plagiarism detection exclusively performs text string comparisons. These systems find copies, but fail to identify disguised plagiarism, such as paraphrases, translations, or idea plagiarism. Detection approaches that consider semantic similarity on word and sentence level exist and have consistently achieved higher detection accuracy for disguised plagiarism forms compared to character-based approaches. However, the high computational effort of these semantic approaches makes them infeasible for use in real-world plagiarism detection scenarios. The proposed hybrid approach uses citation-based methods as a preliminary heuristic to reduce the retrieval space with a relatively low loss in detection accuracy. This preliminary step can then be followed by a computationally more expensive semantic and character-based analysis. We show that such a hybrid approach allows semantic plagiarism detection to become feasible even on large collections for the first time.
Keywords :
citation analysis; information retrieval; semantic Web; academic documents; character-based analysis; character-based methods; citation characteristics; citation-based methods; plagiarism detection methods; real-world plagiarism detection scenarios; retrieval space; semantic argument structure; semantic word similarity; sentence level; Algorithm design and analysis; Citation analysis; Couplings; Handheld computers; Plagiarism; Semantics; Text analysis; Citation Analysis; Disguised Plagiarism; Information Retrieval; Large Scale Collections; Plagiarism Detection; Semantic Analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Digital Libraries (JCDL), 2014 IEEE/ACM Joint Conference on
Conference_Location :
London
Type :
conf
DOI :
10.1109/JCDL.2014.6970168
Filename :
6970168
Link To Document :
بازگشت