DocumentCode :
2823163
Title :
A New Normalized Similarity for Discriminating Similar Documents
Author :
Ji, Jeong-Hoon ; Ryu, Chang-Keon ; Woo, Gyun ; Cho, Hwan-Gue
Author_Institution :
Dept. of Comput. Eng., Pusan Nat. Univ., Pusan
Volume :
2
fYear :
2008
fDate :
2-4 Sept. 2008
Firstpage :
108
Lastpage :
113
Abstract :
To find out similar document pairs from a set of documents, computing normalization similarities is inevitable because the sizes of documents are different from documents to documents. However, the normalized similarities proposed up to now are still unreliably sensitive to the size of programs compared. Due to this fact, most previously announced similarity detection tools have difficulties in determining the cutoff threshold to discriminate similar documents from a set of documents. In this paper, we propose a new normalized similarity based on Weibull distribution. To test the effectiveness of the new similarity measure, we applied it in detecting similar program pairs from a set of programs. According to the experiment, the new similarity measure showed very nice characteristics in discriminating the very similar program pairs from other pairs. Also, the proposed normalized similarity is effective in detecting similar documents written in natural languages.
Keywords :
Weibull distribution; document handling; natural language processing; Weibull distribution; natural languages; normalization similarities; normalized similarities; plagiarism detection; similar document discrimination; similarity detection tools; Automatic programming; Biology computing; Clustering algorithms; Computer networks; Electronic mail; Information management; Natural languages; Plagiarism; Sequences; Weibull distribution; ICPC; Plagiarism Detection; Programming Contest; Weibull;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Networked Computing and Advanced Information Management, 2008. NCM '08. Fourth International Conference on
Conference_Location :
Gyeongju
Print_ISBN :
978-0-7695-3322-3
Type :
conf
DOI :
10.1109/NCM.2008.189
Filename :
4624126
Link To Document :
بازگشت