• DocumentCode
    2761001
  • Title

    Natural Language Processing Based Detection of Duplicate Defect Patterns

  • Author

    Wu, Qian ; Wang, Qianxiang

  • fYear
    2010
  • fDate
    19-23 July 2010
  • Firstpage
    220
  • Lastpage
    225
  • Abstract
    A Defect pattern repository collects different kinds of defect patterns, which are general descriptions of the characteristics of commonly occurring software code defects. Defect patterns can be widely used by programmers, static defect analysis tools, and even runtime verification. Following the idea of web 2.0, defect pattern repositories allow these users to submit defect patterns they found. However, submission of duplicate patterns would lead to a redundancy in the repository. This paper introduces an approach to suggest potential duplicates based on natural language processing. Our approach first computes field similarities based on Vector Space Model, and then employs Information Entropy to determine the field importance, and next combines the field similarities to form the final defect pattern similarity. Two strategies are introduced to make our approach adaptive to special situations. Finally, groups of duplicates are obtained by adopting Hierarchical Clustering. Evaluation indicates that our approach could detect most of the actual duplicates (72% in our experiment) in the repository.
  • Keywords
    Internet; entropy; natural language processing; pattern clustering; program diagnostics; Web 2.0; defect pattern repositories; duplicate defect pattern detection; hierarchical clustering; information entropy; natural language processing; software code defects; static defect analysis tools; vector space model; Defect Pattern; Duplicate; Information Retrieval; Natural Language Processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Software and Applications Conference Workshops (COMPSACW), 2010 IEEE 34th Annual
  • Conference_Location
    Seoul
  • Print_ISBN
    978-1-4244-8089-0
  • Electronic_ISBN
    978-0-7695-4105-1
  • Type

    conf

  • DOI
    10.1109/COMPSACW.2010.45
  • Filename
    5615790