DocumentCode :
2441240
Title :
On how often code is cloned across repositories
Author :
Schwarz, Niko ; Lungu, Mircea ; Robbes, Romain
fYear :
2012
fDate :
2-9 June 2012
Firstpage :
1289
Lastpage :
1292
Abstract :
Detecting code duplication in large code bases, or even across project boundaries, is problematic due to the massive amount of data involved. Large-scale clone detection also opens new challenges beyond asking for the provenance of a single clone fragment, such as assessing the prevalence of code clones on the entire code base, and their evolution. We propose a set of lightweight techniques that may scale up to very large amounts of source code in the presence of multiple versions. The common idea behind these techniques is to use bad hashing to get a quick answer. We report on a case study, the Squeaksource ecosystem, which features thousands of software projects, with more than 40 million versions of methods, across more than seven years of evolution. We provide estimates for the prevalence of type-1, type-2, and type-3 clones in Squeaksource.
Keywords :
project management; software maintenance; Squeaksource ecosystem; bad hashing; code base; code clone prevalence; code duplication detection; code evolution; project boundaries; single clone fragment provenance; source code; Cloning; Ecosystems; Educational institutions; Indexes; Layout; Software; Clone detection; Software ecosystems;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Software Engineering (ICSE), 2012 34th International Conference on
Conference_Location :
Zurich
ISSN :
0270-5257
Print_ISBN :
978-1-4673-1066-6
Electronic_ISBN :
0270-5257
Type :
conf
DOI :
10.1109/ICSE.2012.6227097
Filename :
6227097
Link To Document :
بازگشت