Title :
Java bytecode clone detection via relaxation on code fingerprint and Semantic Web reasoning
Author :
Keivanloo, Iman ; Roy, Chanchal K. ; Rilling, Juergen
Author_Institution :
Dept. of Comput. Sci., Concordia Univ., Montreal, QC, Canada
Abstract :
While finding clones in source code has drawn considerable attention, there has been only very little work in finding similar fragments in binary code and intermediate languages, such as Java bytecode. Some recent studies showed that it is possible to find distinct sets of clone pairs in bytecode representation of source code, which are not always detectable at source code-level. In this paper, we present a bytecode clone detection approach, called SeByte, which exploits the benefits of compilers (the bytecode representation) for detecting a specific type of semantic clones in Java bytecode. SeByte is a hybrid metric-based approach that takes advantage of both, Semantic Web technologies and Set theory. We use a two-step analysis process: (1) Pattern matching via Semantic Web querying and reasoning, and (2) Content matching, using Jaccard coefficient for set similarity measurement. Semantic Web-based pattern matching helps us to find method blocks which share similar patterns even in case of extreme dissimilarity (e.g., numerous repetitions or large gaps). Although it leads to high recall, it gives high false positive rate. We thus use the content matching (via Jaccard) to reduce false positive rate by focusing on content semantic resemblance. Our evaluation of four Java systems and five other tools shows that SeByte can detect a large number of semantic clones that are either not detected or supported by source code based clone detectors.
Keywords :
Java; inference mechanisms; program compilers; query processing; semantic Web; set theory; software metrics; Jaccard coefficient; Java bytecode clone detection; SeByte; binary code; bytecode representation; code fingerprint relaxation; compilers; content matching; hybrid metric-based approach; intermediate languages; semantic Web querying; semantic Web reasoning; semantic Web technologies; semantic Web-based pattern matching; semantic clones; set similarity measurement; set theory; source code clones; Cloning; Cognition; Fingerprint recognition; Java; Pattern matching; Semantic Web; Semantics; Java bytecode; Semantic Web; clone detection;
Conference_Titel :
Software Clones (IWSC), 2012 6th International Workshop on
Conference_Location :
Zurich
Print_ISBN :
978-1-4673-1794-8
DOI :
10.1109/IWSC.2012.6227864