Title :
Clone detection: Why, what and how?
Author :
Akhin, Marat ; Itsykson, Vladimir
Author_Institution :
St.-Petersburg State Polytech. Univ., St. Petersburg, Russia
Abstract :
Excessive code duplication is a bane of modern software development. Several experimental studies show that on average 15 percent of a software system can contain source code clones - repeatedly reused fragments of similar code. While code duplication may increase the speed of initial software development, it undoubtedly leads to problems during software maintenance and support. That is why many developers agree that software clones should be detected and dealt with at every stage of software development life cycle. This paper is a brief survey of current state-of-the-art in clone detection. First, we highlight main sources of code cloning such as copy-and-paste programming, mental code patterns and performance optimizations. We discuss reasons behind the use of these techniques from the developer\´s point of view and possible alternatives to them. Second, we outline major negative effects that clones have on software development. The most serious drawback duplicated code have on software maintenance is increasing the cost of modifications - any modification that changes cloned code must be propagated to every clone instance in the program. Software clones may also create new software bugs when a programmer makes some mistakes during code copying and modification. Increase of source code size due to duplication leads to additional difficulty of code comprehension. Third, we review existing clone detection techniques. Classification based on used source code representation model is given in this work. We also describe and analyze some concrete examples of clone detection techniques highlighting main distinctive features and problems that are present in practical clone detection. Finally, we point out some open problems in the area of clone detection. Currently questions like "What is a code clone?", "Can we predict the impact clones have on software quality" and "How can we increase both clone detection precision and recall at the same time? " stay open to further re- - search. We list the most important questions in modern clone detection and explain why they continue to remain unanswered despite all the progress in clone detection research.
Keywords :
program debugging; software maintenance; clone detection techniques; code comprehension; code duplication; copy-and-paste programming; mental code patterns; performance optimizations; software bugs; software development life cycle; software maintenance; software support; source code size; Cloning; Data mining; Electronic mail; Linux; Programming; Software maintenance; Clone detection; overview; program analysis; quality assurance; software maintenance;
Conference_Titel :
Software Engineering Conference (CEE-SECR), 2010 6th Central and Eastern European
Conference_Location :
Moscow
Print_ISBN :
978-1-4577-0605-9
DOI :
10.1109/CEE-SECR.2010.5783148