Title :
Achieving Linguistic Provenance via Plagiarism Detection
Author :
Idika, Nwokedi ; PHAN, HUY ANH ; Varia, Mayank
Author_Institution :
Lincoln Lab., MIT, Lexington, MA, USA
Abstract :
To go beyond what current provenance systems can capture for natural language text documents, we propose the Lincoln Laboratory Plagiarism for Provenance System (LLPla) as an approach for capturing linguistic provenance. Linguistic provenance infers the origin of text based on its linguistic structure. We take a plagiarism detection approach to this task as identifying similar sections of text is fundamental to linguistic provenance and central to LLPla Ì´s performance. Thus, to determine the most viable plagiarism detection algorithm for use in LLPla Ì, we evaluate three state-of-the-art plagiarism detection algorithms. Moreover, we propose extensions to the best-performing algorithm that improve its precision with negligible effects on recall.
Keywords :
graph theory; linguistics; natural language processing; text analysis; LLPla approach; Lincoln Laboratory Plagiarism for Provenance System; linguistic provenance; linguistic structure; natural language text documents; plagiarism detection approach; recall effect; text origin; Conferences; Detection algorithms; Generators; Laboratories; Plagiarism; Pragmatics; Probabilistic logic; graphs; plagiarism detection; provenance;
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location :
Washington, DC
DOI :
10.1109/ICDAR.2013.133