• DocumentCode
    2744883
  • Title

    DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones

  • Author

    Jiang, Lingxiao ; Misherghi, Ghassan ; Su, Zhendong ; Glondu, Stéphane

  • Author_Institution
    Univ. of California, Davis, CA
  • fYear
    2007
  • fDate
    20-26 May 2007
  • Firstpage
    96
  • Lastpage
    105
  • Abstract
    Detecting code clones has many software engineering applications. Existing approaches either do not scale to large code bases or are not robust against minor code modifications. In this paper, we present an efficient algorithm for identifying similar subtrees and apply it to tree representations of source code. Our algorithm is based on a novel characterization of subtrees with numerical vectors in the Euclidean space Rnmiddot and an efficient algorithm to cluster these vectors w.r.t. the Euclidean distance metric. Subtrees with vectors in one cluster are considered similar. We have implemented our tree similarity algorithm as a clone detection tool called DECKARD and evaluated it on large code bases written in C and Java including the Linux kernel and JDK. Our experiments show that DECKARD is both scalable and accurate. It is also language independent, applicable to any language with a formally specified grammar.
  • Keywords
    software engineering; trees (mathematics); Deckard; Euclidean distance metric; code clones; software engineering; source code; subtrees; tree representations; tree-based detection; Application software; Cloning; Clustering algorithms; Euclidean distance; Fingerprint recognition; Java; Linux; Programming profession; Robustness; Software engineering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Software Engineering, 2007. ICSE 2007. 29th International Conference on
  • Conference_Location
    Minneapolis, MN
  • ISSN
    0270-5257
  • Print_ISBN
    0-7695-2828-7
  • Type

    conf

  • DOI
    10.1109/ICSE.2007.30
  • Filename
    4222572