DocumentCode :
1010540
Title :
Shared information and program plagiarism detection
Author :
Chen, Xin ; Francia, Brent ; Li, Ming ; McKinnon, Brian ; Seker, Amit
Author_Institution :
Dept. of Comput. Sci., Univ. of California, Riverside, CA, USA
Volume :
50
Issue :
7
fYear :
2004
fDate :
7/1/2004 12:00:00 AM
Firstpage :
1545
Lastpage :
1551
Abstract :
A fundamental question in information theory and in computer science is how to measure similarity or the amount of shared information between two sequences. We have proposed a metric, based on Kolmogorov complexity, to answer this question and have proven it to be universal. We apply this metric in measuring the amount of shared information between two computer programs, to enable plagiarism detection. We have designed and implemented a practical system SID (Software Integrity Diagnosis system) that approximates this metric by a heuristic compression algorithm. Experimental results demonstrate that SID has clear advantages over other plagiarism detection systems. SID system server is online at http://software.bioinformatics.uwaterloo.ca/SID/.
Keywords :
computational complexity; data compression; data integrity; heuristic programming; software metrics; Kolmogorov complexity; computer programs; heuristic compression algorithm; program plagiarism detection; shared information; software integrity diagnosis system; Algorithm design and analysis; Application software; Bioinformatics; Computer science; Genomics; Information theory; Internet; Phylogeny; Plagiarism; Programming profession;
fLanguage :
English
Journal_Title :
Information Theory, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9448
Type :
jour
DOI :
10.1109/TIT.2004.830793
Filename :
1306552
Link To Document :
بازگشت