• DocumentCode
    169801
  • Title

    Statistical Analysis of ML-Based Paraphrase Detectors with Lexical Similarity Metrics

  • Author

    El-Alfy, El-Sayed M.

  • Author_Institution
    Coll. of Comput. Sci. & Eng., King Fahd Univ. of Pet. & Miner., Dhahran, Saudi Arabia
  • fYear
    2014
  • fDate
    6-9 May 2014
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    Paraphrase detection has several important applications in natural language processing. Examples of such applications include language translation, text summarization, question answering, plagiarism detection, and online information retrieval. A number of metrics have been proposed in the literature to quantify the textual similarity between two sentences. However, the accuracy of utilizing each similarity metric alone in detecting paraphrases is very low. Though some machine learning (ML) techniques have been deployed for paraphrase detection, there is no known study that intensively benchmarks their performance on this problem under similar conditions. In this paper, we evaluate the utility of integrating five lexical similarity metrics with three standard machine learning paradigms to detect paraphrases. We apply statistical tests to compare and benchmark the relative significance of the adopted ML-based paraphrase detectors on different datasets.
  • Keywords
    learning (artificial intelligence); natural language processing; statistical analysis; ML-based paraphrase detectors; lexical similarity metrics; machine learning paradigms; natural language processing; paraphrase detection; statistical analysis; Educational institutions; Kernel; Measurement; Niobium; Support vector machines; Training; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Science and Applications (ICISA), 2014 International Conference on
  • Conference_Location
    Seoul
  • Print_ISBN
    978-1-4799-4443-9
  • Type

    conf

  • DOI
    10.1109/ICISA.2014.6847467
  • Filename
    6847467