DocumentCode :
3481432
Title :
Reasoning about Global Clones: Scalable Semantic Clone Detection
Author :
Schugerl, Philipp ; Rilling, Juergen ; Charland, Philippe
Author_Institution :
Dept. of Comput. Sci. & Software Eng., Concordia Univ., Montreal, QC, Canada
fYear :
2011
fDate :
18-22 July 2011
Firstpage :
486
Lastpage :
491
Abstract :
The Semantic Web is slowly transforming the Web as we know it into a machine understandable pool of information that can be consumed and reasoned about by various clients. Source code is no exception to this trend and various communities have proposed standards to share code as linked data. With the availability of large amounts of open source code published in publicly accessible repositories, the introduction of massive horizontal scaling frameworks, and cloud computing infrastructures, a new era of software mining across information silos is reshaping the software engineering landscape. Given these technological advances, analyzing code at a global scale, across systems, projects and organizational boundaries, becomes feasible. In this paper, we introduce a clone detection algorithm and its implementation that can scale to such large global datasets, by modeling clones using description logic and applying a horizontal scaling Semantic Web reasoner. We demonstrate how our simple feature vector that only uses control statements, data types and method calls, can yield results similar to other popular clone detection tools. Our approach does not only allow us to reliably identify clones in a global context. By using a semantic reasoner, it also allows us to expand clone detection to a new class of semantic clones. We have compared our algorithm to some of the leading clone detection tools (DECKARD, CCFinder, JCD, and Simian) in order to validate our approach and show the differences in detected clones and performance.
Keywords :
data mining; semantic Web; software engineering; clone detection algorithm; cloud computing infrastructures; control statements; data types; description logic; feature vector; horizontal scaling semantic Web reasoner; massive horizontal scaling framework; method calls; open source code; scalable semantic clone detection; semantic reasoner; software engineering; software mining; Cloning; Cognition; Data models; Java; Semantic Web; Semantics; MapReduce; clone detection; reasoner; semantic web;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Software and Applications Conference (COMPSAC), 2011 IEEE 35th Annual
Conference_Location :
Munich
ISSN :
0730-3157
Print_ISBN :
978-1-4577-0544-1
Electronic_ISBN :
0730-3157
Type :
conf
DOI :
10.1109/COMPSAC.2011.69
Filename :
6032385
Link To Document :
بازگشت