• DocumentCode
    2664792
  • Title

    Document copy detection based on kernel method

  • Author

    Jun-Peng, Bao ; Shen Jun-Yi ; Xiao-Dong, Liu ; Hai-Yan, Liu ; Xiao-Di, Zhang

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Xi´´an Jiaotong Univ., China
  • fYear
    2003
  • fDate
    26-29 Oct. 2003
  • Firstpage
    250
  • Lastpage
    255
  • Abstract
    We present semantic sequence kernel (SSK) to detect document plagiarism, which is derived from string kernel (SK) and word sequence kernel (WSK). SSK first finds out semantic sequences in documents, and then it uses a kernel function to calculate their similarity. SK and WSK only calculate the gap between the first word and the last one. SSK takes into account each common word´s position information. We believe SSK contains both local and global information so that it makes a great progress in small partial plagiarism detection. We compare SSK with relative frequency model and semantic sequence model, which is a word frequency based model. The results show that SSK is excellent on nonrewording corpus. It is also valid on rewording corpus with some impairment on the performance.
  • Keywords
    computational linguistics; document handling; string matching; document copy detection; document plagiarism detection; kernel method; relative frequency model; semantic sequence kernel; string kernel; word position information; word sequence kernel; Computer science; Frequency; Intellectual property; Kernel; Plagiarism; Protection; Prototypes; Resists; Stress; Support vector machines;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003 International Conference on
  • Conference_Location
    Beijing, China
  • Print_ISBN
    0-7803-7902-0
  • Type

    conf

  • DOI
    10.1109/NLPKE.2003.1275908
  • Filename
    1275908