DocumentCode
3487610
Title
Achieving Linguistic Provenance via Plagiarism Detection
Author
Idika, Nwokedi ; PHAN, HUY ANH ; Varia, Mayank
Author_Institution
Lincoln Lab., MIT, Lexington, MA, USA
fYear
2013
fDate
25-28 Aug. 2013
Firstpage
648
Lastpage
652
Abstract
To go beyond what current provenance systems can capture for natural language text documents, we propose the Lincoln Laboratory Plagiarism for Provenance System (LLPla) as an approach for capturing linguistic provenance. Linguistic provenance infers the origin of text based on its linguistic structure. We take a plagiarism detection approach to this task as identifying similar sections of text is fundamental to linguistic provenance and central to LLPla Ì´s performance. Thus, to determine the most viable plagiarism detection algorithm for use in LLPla Ì, we evaluate three state-of-the-art plagiarism detection algorithms. Moreover, we propose extensions to the best-performing algorithm that improve its precision with negligible effects on recall.
Keywords
graph theory; linguistics; natural language processing; text analysis; LLPla approach; Lincoln Laboratory Plagiarism for Provenance System; linguistic provenance; linguistic structure; natural language text documents; plagiarism detection approach; recall effect; text origin; Conferences; Detection algorithms; Generators; Laboratories; Plagiarism; Pragmatics; Probabilistic logic; graphs; plagiarism detection; provenance;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location
Washington, DC
ISSN
1520-5363
Type
conf
DOI
10.1109/ICDAR.2013.133
Filename
6628698
Link To Document