• DocumentCode
    260393
  • Title

    Experiments on the Indonesian plagiarism detection using latent semantic analysis

  • Author

    Soleman, Sidik ; Purwarianti, Ayu

  • Author_Institution
    Sch. of Electr. Eng. & Inf., Bandung Inst. of Technol., Bandung, Indonesia
  • fYear
    2014
  • fDate
    28-30 May 2014
  • Firstpage
    413
  • Lastpage
    418
  • Abstract
    Plagiarism is an important task since its number is increasing and the plagiarism technique is getting difficult. It means that there is not only literal plagiarism but also intelligence plagiarism. In order to handle the intelligence plagiarism, we employed latent semantic analysis (LSA) as the term-document representation. The LSA was used in the Heuristic Retrieval (HR) component and Detailed Analysis (DA) component. We conducted several experiments to compare the token type, the text segmentation and the threshold value. The test data were prepared manually from the available Indonesian paper corpus. Experimental results showed that the LSA outperformed the VSM (Vector Space Model), especially in test cases with intelligence plagiarism.
  • Keywords
    data analysis; text analysis; Indonesian paper corpus; Indonesian plagiarism detection; LSA; VSM; intelligence plagiarism; latent semantic analysis; term-document representation; text segmentation; threshold value; token type; vector space model; Communications technology; Matrix decomposition; Plagiarism; Sections; Semantics; System performance; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information and Communication Technology (ICoICT), 2014 2nd International Conference on
  • Conference_Location
    Bandung
  • Type

    conf

  • DOI
    10.1109/ICoICT.2014.6914098
  • Filename
    6914098