• DocumentCode
    1010540
  • Title

    Shared information and program plagiarism detection

  • Author

    Chen, Xin ; Francia, Brent ; Li, Ming ; McKinnon, Brian ; Seker, Amit

  • Author_Institution
    Dept. of Comput. Sci., Univ. of California, Riverside, CA, USA
  • Volume
    50
  • Issue
    7
  • fYear
    2004
  • fDate
    7/1/2004 12:00:00 AM
  • Firstpage
    1545
  • Lastpage
    1551
  • Abstract
    A fundamental question in information theory and in computer science is how to measure similarity or the amount of shared information between two sequences. We have proposed a metric, based on Kolmogorov complexity, to answer this question and have proven it to be universal. We apply this metric in measuring the amount of shared information between two computer programs, to enable plagiarism detection. We have designed and implemented a practical system SID (Software Integrity Diagnosis system) that approximates this metric by a heuristic compression algorithm. Experimental results demonstrate that SID has clear advantages over other plagiarism detection systems. SID system server is online at http://software.bioinformatics.uwaterloo.ca/SID/.
  • Keywords
    computational complexity; data compression; data integrity; heuristic programming; software metrics; Kolmogorov complexity; computer programs; heuristic compression algorithm; program plagiarism detection; shared information; software integrity diagnosis system; Algorithm design and analysis; Application software; Bioinformatics; Computer science; Genomics; Information theory; Internet; Phylogeny; Plagiarism; Programming profession;
  • fLanguage
    English
  • Journal_Title
    Information Theory, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9448
  • Type

    jour

  • DOI
    10.1109/TIT.2004.830793
  • Filename
    1306552