DocumentCode :
2458703
Title :
Clone detection using abstract syntax trees
Author :
Baxter, Ira D. ; Yahin, Andrew ; Moura, Leonardo ; Anna, Marcelo Sant ; Bier, Lorraine
Author_Institution :
Semantic Designs, Austin, TX, USA
fYear :
1998
fDate :
16-20 Nov 1998
Firstpage :
368
Lastpage :
377
Abstract :
Existing research suggests that a considerable fraction (5-10%) of the source code of large scale computer programs is duplicate code (“clones”). Detection and removal of such clones promises decreased software maintenance costs of possibly the same magnitude. Previous work was limited to detection of either near misses differing only in single lexems, or near misses only between complete functions. The paper presents simple and practical methods for detecting exact and near miss clones over arbitrary program fragments in program source code by using abstract syntax trees. Previous work also did not suggest practical means for removing detected clones. Since our methods operate in terms of the program structure, clones could be removed by mechanical methods producing in-lined procedures or standard preprocessor macros. A tool using these techniques is applied to a C production software system of some 400 K source lines, and the results confirm detected levels of duplication found by previous work. The tool produces macro bodies needed for clone removal, and macro invocations to replace the clones. The tool uses a variation of the well known compiler method for detecting common sub expressions. This method determines exact tree matches; a number of adjustments are needed to detect equivalent statement sequences, commutative operands, and nearly exact matches. We additionally suggest that clone detection could also be useful in producing more structured code, and in reverse engineering to discover domain concepts and their implementations
Keywords :
C language; program compilers; program diagnostics; reverse engineering; software maintenance; trees (mathematics); C production software system; abstract syntax trees; arbitrary program fragments; clone detection; clone removal; common sub expressions; commutative operands; compiler method; detected clones; domain concepts; duplicate code; equivalent statement sequences; exact tree matches; in-lined procedures; large scale computer programs; lexems; macro bodies; macro invocations; mechanical methods; near miss clones; nearly exact matches; program source code; program structure; reverse engineering; software maintenance costs; standard preprocessor macros; structured code; Cloning; Costs; Encapsulation; Large-scale systems; Production systems; Programming profession; Software engineering; Software maintenance; Software systems; World Wide Web;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Software Maintenance, 1998. Proceedings., International Conference on
Conference_Location :
Bethesda, MD
ISSN :
1063-6773
Print_ISBN :
0-8186-8779-7
Type :
conf
DOI :
10.1109/ICSM.1998.738528
Filename :
738528
Link To Document :
بازگشت