DocumentCode :
1684049
Title :
Large-Scale Inter-System Clone Detection Using Suffix Trees
Author :
Koschke, Rainer
Author_Institution :
Univ. of Bremen, Bremen, Germany
fYear :
2012
Firstpage :
309
Lastpage :
318
Abstract :
Detecting license violations of source code requires to compare a suspected system against a very large corpus of source code, for instance, the Debian source distribution. Thus, techniques detecting suspiciously similar code must scale in terms of resources needed. In addition to that, high precision of the detection is necessary because a human needs to inspect the results. The current approaches to address the resource challenge is to create an index for the corpus to which the suspected source code is compared. The index creation, however, is very costly. If the analysis is done only once, it may not be worth the effort. This paper demonstrates how suffix trees can be used to obtain a scalable comparison. Our evaluation shows that this approach is faster than current index-based techniques. In addition to that, this paper proposes a method to improve precision through user feedback and automated data mining.
Keywords :
data mining; law; software maintenance; Debian source distribution; automated data mining; index creation; index-based techniques; large-scale inter-system clone detection; license violations; source code; suffix trees; suspiciously similar code; user feedback; Arrays; Cloning; Detectors; Indexes; Licenses; Search problems; Software; clone detection; code search; license violation detection;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Software Maintenance and Reengineering (CSMR), 2012 16th European Conference on
Conference_Location :
Szeged
ISSN :
1534-5351
Print_ISBN :
978-1-4673-0984-4
Type :
conf
DOI :
10.1109/CSMR.2012.37
Filename :
6178897
Link To Document :
بازگشت