• DocumentCode
    3487610
  • Title

    Achieving Linguistic Provenance via Plagiarism Detection

  • Author

    Idika, Nwokedi ; PHAN, HUY ANH ; Varia, Mayank

  • Author_Institution
    Lincoln Lab., MIT, Lexington, MA, USA
  • fYear
    2013
  • fDate
    25-28 Aug. 2013
  • Firstpage
    648
  • Lastpage
    652
  • Abstract
    To go beyond what current provenance systems can capture for natural language text documents, we propose the Lincoln Laboratory Plagiarism for Provenance System (LLPla) as an approach for capturing linguistic provenance. Linguistic provenance infers the origin of text based on its linguistic structure. We take a plagiarism detection approach to this task as identifying similar sections of text is fundamental to linguistic provenance and central to LLPla Ì´s performance. Thus, to determine the most viable plagiarism detection algorithm for use in LLPla Ì, we evaluate three state-of-the-art plagiarism detection algorithms. Moreover, we propose extensions to the best-performing algorithm that improve its precision with negligible effects on recall.
  • Keywords
    graph theory; linguistics; natural language processing; text analysis; LLPla approach; Lincoln Laboratory Plagiarism for Provenance System; linguistic provenance; linguistic structure; natural language text documents; plagiarism detection approach; recall effect; text origin; Conferences; Detection algorithms; Generators; Laboratories; Plagiarism; Pragmatics; Probabilistic logic; graphs; plagiarism detection; provenance;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
  • Conference_Location
    Washington, DC
  • ISSN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2013.133
  • Filename
    6628698