مرکز منطقه ای اطلاع رساني علوم و فناوري - Selective chunking — Easy and effective way to estimate text similarity

DocumentCode :

670221

Title :

Selective chunking — Easy and effective way to estimate text similarity

Author :

Kucecka, Tomas ; Chuda, Daniela ; Samuhel, Patrik

Author_Institution :

Fac. of Inf. & Inf. Technol., Slovak Univ. of Technol., Bratislava, Slovakia

fYear :

2013

fDate :

19-21 Nov. 2013

Firstpage :

381

Lastpage :

385

Abstract :

Plagiarism is a serious problem especially in academic environment. Basically we define this problem as a theft of stealing somebody else´s work or ideas. In this paper we focus on plagiarism in a domain of student assignments written in natural language. We propose an approach that should faster and better identify copied fragments of text data than standard approaches. We first identify topic related pairs of text documents and then select those pairs on further processing that discuss similar topic. We experimented with usage of different chunking methods in the comparison process to overcome typical problems as shorter fragments of text copied from other documents. The results show that our approach is more suitable for plagiarism detection as a standard n-gram method.

Keywords :

educational administrative data processing; natural language processing; text analysis; natural language; plagiarism detection; selective chunking; similar topic; standard n-gram method; student assignments; text documents; text similarity estimation; Informatics; Plagiarism; Standards; Time complexity; Vectors;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computational Intelligence and Informatics (CINTI), 2013 IEEE 14th International Symposium on

Conference_Location :

Budapest

Print_ISBN :

978-1-4799-0194-4

Type :

conf

DOI :

10.1109/CINTI.2013.6705226

Filename :

6705226

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=670221