• DocumentCode
    3086283
  • Title

    Batch source-code plagiarism detection using an algorithm for the bounded longest common subsequence problem

  • Author

    Campos, R.A.C. ; Martinez, F.J.Z.

  • Author_Institution
    Dept. de Sist., UAM Azcapotzalco, Mexico City, Mexico
  • fYear
    2012
  • fDate
    26-28 Sept. 2012
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    Source-code plagiarism detection is an unfortunate but necessary activity when reviewing assignments of programming courses. While being reasonably easy to fool, string-based comparisons offer a high degree of accuracy with almost no false positives and usually a good string similarity metric is the length of their longest common subsequence. In the case of two strings, the dynamic programming algorithm for this calculation unfortunately takes quadratic time even if the strings are equal. In this paper we present an algorithm that, given a batch of source-code files, efficiently finds all pairs of similar files by preprocessing the files and then using a fast branch-and-bound algorithm to find only those pairs whose longest common subsequence is indicative of plagiarism.
  • Keywords
    dynamic programming; programming theory; string matching; batch source code plagiarism detection; bounded longest common subsequence problem; branch-and-bound algorithm; dynamic programming algorithm; false positives; source code file preprocessing; string similarity metric; string-based comparisons; Algorithm design and analysis; Data preprocessing; Dynamic programming; Heuristic algorithms; Indexes; Plagiarism; Programming; Plagiarism detection; branch and bound; longest common subsequence; source-code;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Electrical Engineering, Computing Science and Automatic Control (CCE), 2012 9th International Conference on
  • Conference_Location
    Mexico City
  • Print_ISBN
    978-1-4673-2170-9
  • Type

    conf

  • DOI
    10.1109/ICEEE.2012.6421180
  • Filename
    6421180