DocumentCode :
670221
Title :
Selective chunking — Easy and effective way to estimate text similarity
Author :
Kucecka, Tomas ; Chuda, Daniela ; Samuhel, Patrik
Author_Institution :
Fac. of Inf. & Inf. Technol., Slovak Univ. of Technol., Bratislava, Slovakia
fYear :
2013
fDate :
19-21 Nov. 2013
Firstpage :
381
Lastpage :
385
Abstract :
Plagiarism is a serious problem especially in academic environment. Basically we define this problem as a theft of stealing somebody else´s work or ideas. In this paper we focus on plagiarism in a domain of student assignments written in natural language. We propose an approach that should faster and better identify copied fragments of text data than standard approaches. We first identify topic related pairs of text documents and then select those pairs on further processing that discuss similar topic. We experimented with usage of different chunking methods in the comparison process to overcome typical problems as shorter fragments of text copied from other documents. The results show that our approach is more suitable for plagiarism detection as a standard n-gram method.
Keywords :
educational administrative data processing; natural language processing; text analysis; natural language; plagiarism detection; selective chunking; similar topic; standard n-gram method; student assignments; text documents; text similarity estimation; Informatics; Plagiarism; Standards; Time complexity; Vectors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence and Informatics (CINTI), 2013 IEEE 14th International Symposium on
Conference_Location :
Budapest
Print_ISBN :
978-1-4799-0194-4
Type :
conf
DOI :
10.1109/CINTI.2013.6705226
Filename :
6705226
Link To Document :
بازگشت