• DocumentCode
    2636676
  • Title

    A Platform Framework for Cross-Lingual Text Relatedness Evaluation and Plagiarism Detection

  • Author

    Lee, Chung-Hong ; Wu, Chih-Hong ; Yang, Hsin-Chang

  • Author_Institution
    Dept. of Electr. Eng., Nat. Kaohsiung Univ. of Appl. Sci., Kaohsiung
  • fYear
    2008
  • fDate
    18-20 June 2008
  • Firstpage
    303
  • Lastpage
    303
  • Abstract
    Research work related to plagiarism detection methods in dealing with monolingual texts (e.g. English texts) have been well established in recent years. However, little attention has been paid to facilitate plagiarism detection in cross-lingual text collections (e.g. English and Chinese texts). In this paper we present a system platform to evaluating text similarity and relatedness in multilingual text collections for plagiarism detection. First, we utilized a number of selected texts in Chinese-English parallel corpora collected from Internet to train text classifiers based on the Support Vector Machines (SVM) model. As such, the multilingual texts of unknown category can be classified by the trained classifiers. Subsequently, the resulting categorized texts were measured by means of a language-neutral clustering technique based on Self-Organizing Maps (SOM) method for evaluating semantic similarity among texts. The preliminary results show that our platform framework has the potential for cross-lingual text relatedness evaluation and plagiarism detection.
  • Keywords
    Internet; classification; computer crime; learning (artificial intelligence); natural languages; pattern clustering; self-organising feature maps; support vector machines; text analysis; Chinese-English parallel corpora; Internet; cross-lingual plagiarism detection; cross-lingual text relatedness evaluation; language-neutral clustering technique; monolingual text; multilingual text collection; self-organizing map method; support vector machine; text classifier training; text similarity evaluation; Information management; Internet; Machine learning; Natural languages; Plagiarism; Self organizing feature maps; Support vector machine classification; Support vector machines; Text categorization; Text recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Innovative Computing Information and Control, 2008. ICICIC '08. 3rd International Conference on
  • Conference_Location
    Dalian, Liaoning
  • Print_ISBN
    978-0-7695-3161-8
  • Electronic_ISBN
    978-0-7695-3161-8
  • Type

    conf

  • DOI
    10.1109/ICICIC.2008.76
  • Filename
    4603492