DocumentCode :
2636676
Title :
A Platform Framework for Cross-Lingual Text Relatedness Evaluation and Plagiarism Detection
Author :
Lee, Chung-Hong ; Wu, Chih-Hong ; Yang, Hsin-Chang
Author_Institution :
Dept. of Electr. Eng., Nat. Kaohsiung Univ. of Appl. Sci., Kaohsiung
fYear :
2008
fDate :
18-20 June 2008
Firstpage :
303
Lastpage :
303
Abstract :
Research work related to plagiarism detection methods in dealing with monolingual texts (e.g. English texts) have been well established in recent years. However, little attention has been paid to facilitate plagiarism detection in cross-lingual text collections (e.g. English and Chinese texts). In this paper we present a system platform to evaluating text similarity and relatedness in multilingual text collections for plagiarism detection. First, we utilized a number of selected texts in Chinese-English parallel corpora collected from Internet to train text classifiers based on the Support Vector Machines (SVM) model. As such, the multilingual texts of unknown category can be classified by the trained classifiers. Subsequently, the resulting categorized texts were measured by means of a language-neutral clustering technique based on Self-Organizing Maps (SOM) method for evaluating semantic similarity among texts. The preliminary results show that our platform framework has the potential for cross-lingual text relatedness evaluation and plagiarism detection.
Keywords :
Internet; classification; computer crime; learning (artificial intelligence); natural languages; pattern clustering; self-organising feature maps; support vector machines; text analysis; Chinese-English parallel corpora; Internet; cross-lingual plagiarism detection; cross-lingual text relatedness evaluation; language-neutral clustering technique; monolingual text; multilingual text collection; self-organizing map method; support vector machine; text classifier training; text similarity evaluation; Information management; Internet; Machine learning; Natural languages; Plagiarism; Self organizing feature maps; Support vector machine classification; Support vector machines; Text categorization; Text recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Innovative Computing Information and Control, 2008. ICICIC '08. 3rd International Conference on
Conference_Location :
Dalian, Liaoning
Print_ISBN :
978-0-7695-3161-8
Electronic_ISBN :
978-0-7695-3161-8
Type :
conf
DOI :
10.1109/ICICIC.2008.76
Filename :
4603492
Link To Document :
بازگشت