• DocumentCode
    1900934
  • Title

    Can I clone this piece of code here?

  • Author

    Xiaoyin Wang ; Yingnong Dang ; Lu Zhang ; Dongmei Zhang ; Lan, Erica ; Hong Mei

  • Author_Institution
    Key Lab. of High Confidence Software Technol., Peking Univ., Beijing, China
  • fYear
    2012
  • fDate
    3-7 Sept. 2012
  • Firstpage
    170
  • Lastpage
    179
  • Abstract
    While code cloning is a convenient way for developers to reuse existing code, it may potentially lead to negative impacts, such as degrading code quality or increasing maintenance costs. Actually, some cloned code pieces are viewed as harmless since they evolve independently, while some other cloned code pieces are viewed as harmful since they need to be changed consistently, thus incurring extra maintenance costs. Recent studies demonstrate that neither the percentage of harmful code clones nor that of harmless code clones is negligible. To assist developers in leveraging the benefits of harmless code cloning and/or in avoiding the negative impacts of harmful code cloning, we propose a novel approach that automatically predicts the harmfulness of a code cloning operation at the point of performing copy-and-paste. Our insight is that the potential harmfulness of a code cloning operation may relate to some characteristics of the code to be cloned and the characteristics of its context. Based on a number of features extracted from the cloned code and the context of the code cloning operation, we use Bayesian Networks, a machine-learning technique, to predict the harmfulness of an intended code cloning operation. We evaluated our approach on two large-scale industrial software projects under two usage scenarios: 1) approving only cloning operations predicted to be very likely of no harm, and 2) blocking only cloning operations predicted to be very likely of harm. In the first scenario, our approach is able to approve more than 50% cloning operations with a precision higher than 94.9% in both subjects. In the second scenario, our approach is able to avoid more than 48% of the harmful cloning operations by blocking only 15% of the cloning operations for the first subject, and avoid more than 67% of the cloning operations by blocking only 34% of the cloning operations for the second subject.
  • Keywords
    belief networks; software reusability; Bayesian networks; cloning operation; code quality degradation; copy-and-paste operation; harmful code cloning; harmless code cloning; large-scale industrial software project; machine learning technique; maintenance cost; software reuse; Bayesian networks; Code cloning; Harmfulness prediction; Programming aid;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Automated Software Engineering (ASE), 2012 Proceedings of the 27th IEEE/ACM International Conference on
  • Conference_Location
    Essen
  • Print_ISBN
    978-1-4503-1204-2
  • Type

    conf

  • DOI
    10.1145/2351676.2351701
  • Filename
    6494924