• DocumentCode
    1684049
  • Title

    Large-Scale Inter-System Clone Detection Using Suffix Trees

  • Author

    Koschke, Rainer

  • Author_Institution
    Univ. of Bremen, Bremen, Germany
  • fYear
    2012
  • Firstpage
    309
  • Lastpage
    318
  • Abstract
    Detecting license violations of source code requires to compare a suspected system against a very large corpus of source code, for instance, the Debian source distribution. Thus, techniques detecting suspiciously similar code must scale in terms of resources needed. In addition to that, high precision of the detection is necessary because a human needs to inspect the results. The current approaches to address the resource challenge is to create an index for the corpus to which the suspected source code is compared. The index creation, however, is very costly. If the analysis is done only once, it may not be worth the effort. This paper demonstrates how suffix trees can be used to obtain a scalable comparison. Our evaluation shows that this approach is faster than current index-based techniques. In addition to that, this paper proposes a method to improve precision through user feedback and automated data mining.
  • Keywords
    data mining; law; software maintenance; Debian source distribution; automated data mining; index creation; index-based techniques; large-scale inter-system clone detection; license violations; source code; suffix trees; suspiciously similar code; user feedback; Arrays; Cloning; Detectors; Indexes; Licenses; Search problems; Software; clone detection; code search; license violation detection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Software Maintenance and Reengineering (CSMR), 2012 16th European Conference on
  • Conference_Location
    Szeged
  • ISSN
    1534-5351
  • Print_ISBN
    978-1-4673-0984-4
  • Type

    conf

  • DOI
    10.1109/CSMR.2012.37
  • Filename
    6178897