DocumentCode
260393
Title
Experiments on the Indonesian plagiarism detection using latent semantic analysis
Author
Soleman, Sidik ; Purwarianti, Ayu
Author_Institution
Sch. of Electr. Eng. & Inf., Bandung Inst. of Technol., Bandung, Indonesia
fYear
2014
fDate
28-30 May 2014
Firstpage
413
Lastpage
418
Abstract
Plagiarism is an important task since its number is increasing and the plagiarism technique is getting difficult. It means that there is not only literal plagiarism but also intelligence plagiarism. In order to handle the intelligence plagiarism, we employed latent semantic analysis (LSA) as the term-document representation. The LSA was used in the Heuristic Retrieval (HR) component and Detailed Analysis (DA) component. We conducted several experiments to compare the token type, the text segmentation and the threshold value. The test data were prepared manually from the available Indonesian paper corpus. Experimental results showed that the LSA outperformed the VSM (Vector Space Model), especially in test cases with intelligence plagiarism.
Keywords
data analysis; text analysis; Indonesian paper corpus; Indonesian plagiarism detection; LSA; VSM; intelligence plagiarism; latent semantic analysis; term-document representation; text segmentation; threshold value; token type; vector space model; Communications technology; Matrix decomposition; Plagiarism; Sections; Semantics; System performance; Vectors;
fLanguage
English
Publisher
ieee
Conference_Titel
Information and Communication Technology (ICoICT), 2014 2nd International Conference on
Conference_Location
Bandung
Type
conf
DOI
10.1109/ICoICT.2014.6914098
Filename
6914098
Link To Document