Can I clone this piece of code here?

Author

Xiaoyin Wang ; Yingnong Dang ; Lu Zhang ; Dongmei Zhang ; Lan, Erica ; Hong Mei

Author_Institution

Key Lab. of High Confidence Software Technol., Peking Univ., Beijing, China

fYear

2012

fDate

3-7 Sept. 2012

Firstpage

170

Lastpage

179

Abstract

While code cloning is a convenient way for developers to reuse existing code, it may potentially lead to negative impacts, such as degrading code quality or increasing maintenance costs. Actually, some cloned code pieces are viewed as harmless since they evolve independently, while some other cloned code pieces are viewed as harmful since they need to be changed consistently, thus incurring extra maintenance costs. Recent studies demonstrate that neither the percentage of harmful code clones nor that of harmless code clones is negligible. To assist developers in leveraging the benefits of harmless code cloning and/or in avoiding the negative impacts of harmful code cloning, we propose a novel approach that automatically predicts the harmfulness of a code cloning operation at the point of performing copy-and-paste. Our insight is that the potential harmfulness of a code cloning operation may relate to some characteristics of the code to be cloned and the characteristics of its context. Based on a number of features extracted from the cloned code and the context of the code cloning operation, we use Bayesian Networks, a machine-learning technique, to predict the harmfulness of an intended code cloning operation. We evaluated our approach on two large-scale industrial software projects under two usage scenarios: 1) approving only cloning operations predicted to be very likely of no harm, and 2) blocking only cloning operations predicted to be very likely of harm. In the first scenario, our approach is able to approve more than 50% cloning operations with a precision higher than 94.9% in both subjects. In the second scenario, our approach is able to avoid more than 48% of the harmful cloning operations by blocking only 15% of the cloning operations for the first subject, and avoid more than 67% of the cloning operations by blocking only 34% of the cloning operations for the second subject.

Keywords

belief networks; software reusability; Bayesian networks; cloning operation; code quality degradation; copy-and-paste operation; harmful code cloning; harmless code cloning; large-scale industrial software project; machine learning technique; maintenance cost; software reuse; Bayesian networks; Code cloning; Harmfulness prediction; Programming aid;

fLanguage

English

Publisher

ieee

Conference_Titel

Automated Software Engineering (ASE), 2012 Proceedings of the 27th IEEE/ACM International Conference on

Conference_Location

Essen

Print_ISBN

978-1-4503-1204-2

Type

conf

DOI

10.1145/2351676.2351701

Filename

6494924