DocumentCode
2761001
Title
Natural Language Processing Based Detection of Duplicate Defect Patterns
Author
Wu, Qian ; Wang, Qianxiang
fYear
2010
fDate
19-23 July 2010
Firstpage
220
Lastpage
225
Abstract
A Defect pattern repository collects different kinds of defect patterns, which are general descriptions of the characteristics of commonly occurring software code defects. Defect patterns can be widely used by programmers, static defect analysis tools, and even runtime verification. Following the idea of web 2.0, defect pattern repositories allow these users to submit defect patterns they found. However, submission of duplicate patterns would lead to a redundancy in the repository. This paper introduces an approach to suggest potential duplicates based on natural language processing. Our approach first computes field similarities based on Vector Space Model, and then employs Information Entropy to determine the field importance, and next combines the field similarities to form the final defect pattern similarity. Two strategies are introduced to make our approach adaptive to special situations. Finally, groups of duplicates are obtained by adopting Hierarchical Clustering. Evaluation indicates that our approach could detect most of the actual duplicates (72% in our experiment) in the repository.
Keywords
Internet; entropy; natural language processing; pattern clustering; program diagnostics; Web 2.0; defect pattern repositories; duplicate defect pattern detection; hierarchical clustering; information entropy; natural language processing; software code defects; static defect analysis tools; vector space model; Defect Pattern; Duplicate; Information Retrieval; Natural Language Processing;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Software and Applications Conference Workshops (COMPSACW), 2010 IEEE 34th Annual
Conference_Location
Seoul
Print_ISBN
978-1-4244-8089-0
Electronic_ISBN
978-0-7695-4105-1
Type
conf
DOI
10.1109/COMPSACW.2010.45
Filename
5615790
Link To Document