DocumentCode :
2344984
Title :
Web Based Cross Language Plagiarism Detection
Author :
Kent, Chow Kok ; Salim, Naomie
Author_Institution :
Fac. of CS & Info. Sys., Univ. Teknol. Malaysia, Skudai, Malaysia
fYear :
2010
fDate :
28-30 Sept. 2010
Firstpage :
199
Lastpage :
204
Abstract :
As the Internet help us cross language and cultural border by providing different types of translation tools, cross language plagiarism, also known as translation plagiarism are bound to arise. In this paper, we propose a new approach in detecting cross language plagiarism. In order to limit certain scale of our proposed system, we are consider Bahasa Melayu as an input language of the submitted query document and English as a target language of similar, possibly plagiarised documents. Input documents are translated into English using Google Translate API before undergo pre-processing phase (stemming and removal of stop words). Tokenized documents are sent to the Google AJAX Search API to detect similar documents throughout the World Wide Web. Only top ten sources retrieved by the Google Search API are considered as the candidate of source documents. We integrate the use of Stanford Parser and WordNet to determine the similarity level between the suspected documents with those candidate source documents. After that, a detailed similarity analysis is performed and a report of results is produced.
Keywords :
Internet; Web sites; application program interfaces; document handling; grammars; language translation; natural language processing; query processing; search engines; Bahasa Melayu; English language; Google AJAX search API; Google translate API; Internet; Stanford parser; Tokenized documents; Web based cross language plagiarism detection; WordNet; World Wide Web; language translation tool; query document; Plagiarism; Stanford Parser; WordNet; cross language; semantic similarity;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence, Modelling and Simulation (CIMSiM), 2010 Second International Conference on
Conference_Location :
Bali
Print_ISBN :
978-1-4244-8652-6
Electronic_ISBN :
978-0-7695-4262-1
Type :
conf
DOI :
10.1109/CIMSiM.2010.10
Filename :
5701845
Link To Document :
بازگشت