DocumentCode
1684049
Title
Large-Scale Inter-System Clone Detection Using Suffix Trees
Author
Koschke, Rainer
Author_Institution
Univ. of Bremen, Bremen, Germany
fYear
2012
Firstpage
309
Lastpage
318
Abstract
Detecting license violations of source code requires to compare a suspected system against a very large corpus of source code, for instance, the Debian source distribution. Thus, techniques detecting suspiciously similar code must scale in terms of resources needed. In addition to that, high precision of the detection is necessary because a human needs to inspect the results. The current approaches to address the resource challenge is to create an index for the corpus to which the suspected source code is compared. The index creation, however, is very costly. If the analysis is done only once, it may not be worth the effort. This paper demonstrates how suffix trees can be used to obtain a scalable comparison. Our evaluation shows that this approach is faster than current index-based techniques. In addition to that, this paper proposes a method to improve precision through user feedback and automated data mining.
Keywords
data mining; law; software maintenance; Debian source distribution; automated data mining; index creation; index-based techniques; large-scale inter-system clone detection; license violations; source code; suffix trees; suspiciously similar code; user feedback; Arrays; Cloning; Detectors; Indexes; Licenses; Search problems; Software; clone detection; code search; license violation detection;
fLanguage
English
Publisher
ieee
Conference_Titel
Software Maintenance and Reengineering (CSMR), 2012 16th European Conference on
Conference_Location
Szeged
ISSN
1534-5351
Print_ISBN
978-1-4673-0984-4
Type
conf
DOI
10.1109/CSMR.2012.37
Filename
6178897
Link To Document