Title :
Statistical Analysis of ML-Based Paraphrase Detectors with Lexical Similarity Metrics
Author :
El-Alfy, El-Sayed M.
Author_Institution :
Coll. of Comput. Sci. & Eng., King Fahd Univ. of Pet. & Miner., Dhahran, Saudi Arabia
Abstract :
Paraphrase detection has several important applications in natural language processing. Examples of such applications include language translation, text summarization, question answering, plagiarism detection, and online information retrieval. A number of metrics have been proposed in the literature to quantify the textual similarity between two sentences. However, the accuracy of utilizing each similarity metric alone in detecting paraphrases is very low. Though some machine learning (ML) techniques have been deployed for paraphrase detection, there is no known study that intensively benchmarks their performance on this problem under similar conditions. In this paper, we evaluate the utility of integrating five lexical similarity metrics with three standard machine learning paradigms to detect paraphrases. We apply statistical tests to compare and benchmark the relative significance of the adopted ML-based paraphrase detectors on different datasets.
Keywords :
learning (artificial intelligence); natural language processing; statistical analysis; ML-based paraphrase detectors; lexical similarity metrics; machine learning paradigms; natural language processing; paraphrase detection; statistical analysis; Educational institutions; Kernel; Measurement; Niobium; Support vector machines; Training; Vectors;
Conference_Titel :
Information Science and Applications (ICISA), 2014 International Conference on
Conference_Location :
Seoul
Print_ISBN :
978-1-4799-4443-9
DOI :
10.1109/ICISA.2014.6847467