DocumentCode
1010540
Title
Shared information and program plagiarism detection
Author
Chen, Xin ; Francia, Brent ; Li, Ming ; McKinnon, Brian ; Seker, Amit
Author_Institution
Dept. of Comput. Sci., Univ. of California, Riverside, CA, USA
Volume
50
Issue
7
fYear
2004
fDate
7/1/2004 12:00:00 AM
Firstpage
1545
Lastpage
1551
Abstract
A fundamental question in information theory and in computer science is how to measure similarity or the amount of shared information between two sequences. We have proposed a metric, based on Kolmogorov complexity, to answer this question and have proven it to be universal. We apply this metric in measuring the amount of shared information between two computer programs, to enable plagiarism detection. We have designed and implemented a practical system SID (Software Integrity Diagnosis system) that approximates this metric by a heuristic compression algorithm. Experimental results demonstrate that SID has clear advantages over other plagiarism detection systems. SID system server is online at http://software.bioinformatics.uwaterloo.ca/SID/.
Keywords
computational complexity; data compression; data integrity; heuristic programming; software metrics; Kolmogorov complexity; computer programs; heuristic compression algorithm; program plagiarism detection; shared information; software integrity diagnosis system; Algorithm design and analysis; Application software; Bioinformatics; Computer science; Genomics; Information theory; Internet; Phylogeny; Plagiarism; Programming profession;
fLanguage
English
Journal_Title
Information Theory, IEEE Transactions on
Publisher
ieee
ISSN
0018-9448
Type
jour
DOI
10.1109/TIT.2004.830793
Filename
1306552
Link To Document