Title :
Scaling classical clone detection tools for ultra-large datasets: An exploratory study
Author :
Svajlenko, Jeffrey ; Keivanloo, Iman ; Roy, Chanchal K.
Author_Institution :
Dept. of Comput. Sci., Univ. of Saskatchewan, Saskatoon, SK, Canada
Abstract :
Detecting clones from large datasets is an interesting research topic for a number of reasons. However, building scalable clone detection tools is challenging and it is often impossible to use existing state of the art tools for such large datasets. In this research we have investigated the use of our Shuffling Framework for scaling classical clone detection tools to ultra large datasets. This framework achieves scalability on standard hardware by partitioning the dataset and shuffling the partitions over a number of detection rounds. This approach does not require modification to the subject tools, which allows their individual strengths and precisions to be captured at an acceptable loss of recall. In our study, we explored the performance and applicability of our framework for six clone detection tools. The clones found during our experiment were used to comment on the cloning habits of the global Java open-source development community.
Keywords :
Java; data mining; public domain software; software engineering; very large databases; classical clone detection tool scaling; detection rounds; global Java open-source development community; shuffling framework; software mining applications; software mining experiments; ultra-large datasets; Cloning; Gold; Hardware; Java; Measurement; Scalability; Standards; Clone detection; large dataset; scalability;
Conference_Titel :
Software Clones (IWSC), 2013 7th International Workshop on
Conference_Location :
San Francisco, CA
DOI :
10.1109/IWSC.2013.6613037