DocumentCode
2664792
Title
Document copy detection based on kernel method
Author
Jun-Peng, Bao ; Shen Jun-Yi ; Xiao-Dong, Liu ; Hai-Yan, Liu ; Xiao-Di, Zhang
Author_Institution
Dept. of Comput. Sci. & Eng., Xi´´an Jiaotong Univ., China
fYear
2003
fDate
26-29 Oct. 2003
Firstpage
250
Lastpage
255
Abstract
We present semantic sequence kernel (SSK) to detect document plagiarism, which is derived from string kernel (SK) and word sequence kernel (WSK). SSK first finds out semantic sequences in documents, and then it uses a kernel function to calculate their similarity. SK and WSK only calculate the gap between the first word and the last one. SSK takes into account each common word´s position information. We believe SSK contains both local and global information so that it makes a great progress in small partial plagiarism detection. We compare SSK with relative frequency model and semantic sequence model, which is a word frequency based model. The results show that SSK is excellent on nonrewording corpus. It is also valid on rewording corpus with some impairment on the performance.
Keywords
computational linguistics; document handling; string matching; document copy detection; document plagiarism detection; kernel method; relative frequency model; semantic sequence kernel; string kernel; word position information; word sequence kernel; Computer science; Frequency; Intellectual property; Kernel; Plagiarism; Protection; Prototypes; Resists; Stress; Support vector machines;
fLanguage
English
Publisher
ieee
Conference_Titel
Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003 International Conference on
Conference_Location
Beijing, China
Print_ISBN
0-7803-7902-0
Type
conf
DOI
10.1109/NLPKE.2003.1275908
Filename
1275908
Link To Document