Title :
Document copy detection based on kernel method
Author :
Jun-Peng, Bao ; Shen Jun-Yi ; Xiao-Dong, Liu ; Hai-Yan, Liu ; Xiao-Di, Zhang
Author_Institution :
Dept. of Comput. Sci. & Eng., Xi´´an Jiaotong Univ., China
Abstract :
We present semantic sequence kernel (SSK) to detect document plagiarism, which is derived from string kernel (SK) and word sequence kernel (WSK). SSK first finds out semantic sequences in documents, and then it uses a kernel function to calculate their similarity. SK and WSK only calculate the gap between the first word and the last one. SSK takes into account each common word´s position information. We believe SSK contains both local and global information so that it makes a great progress in small partial plagiarism detection. We compare SSK with relative frequency model and semantic sequence model, which is a word frequency based model. The results show that SSK is excellent on nonrewording corpus. It is also valid on rewording corpus with some impairment on the performance.
Keywords :
computational linguistics; document handling; string matching; document copy detection; document plagiarism detection; kernel method; relative frequency model; semantic sequence kernel; string kernel; word position information; word sequence kernel; Computer science; Frequency; Intellectual property; Kernel; Plagiarism; Protection; Prototypes; Resists; Stress; Support vector machines;
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003 International Conference on
Conference_Location :
Beijing, China
Print_ISBN :
0-7803-7902-0
DOI :
10.1109/NLPKE.2003.1275908