DocumentCode
2439970
Title
Automated Duplicate Bug Report Classification Using Subsequence Matching
Author
Banerjee, Sean ; Cukic, Bojan ; Adjeroh, Donald
Author_Institution
Lane Dept. of Comput. Sci. & Electr. Eng., West Virginia Univ., Morgantown, WV, USA
fYear
2012
fDate
25-27 Oct. 2012
Firstpage
74
Lastpage
81
Abstract
The use of open bug tracking repositories like Bugzilla is common in many software applications. They allow developers, testers and users the ability to report problems associated with the system and track resolution status. Open and democratic reporting tools, however, face one major challenge: users can, and often do, submit reports describing the same problem. Research in duplicate report detection has primarily focused on word frequency based similarity measures paying little regard to the context or structure of the reporting language. Thus, in large repositories, reports describing different issues may be marked as duplicates due to the frequent use of common words. In this paper, we present Factor LCS, a methodology which utilizes common sequence matching for duplicate report detection. We demonstrate the approach by analyzing the complete Fire fox bug repository up until March 2012 as well as a smaller subset of Eclipse dataset from January 1, 2008 to December 31, 2008. We achieve a duplicate recall rate above 70% with Fire fox, which exceeds the results reported on smaller subsets of the same repository.
Keywords
formal verification; program debugging; software engineering; Bugzilla; Eclipse dataset; Factor LCS; Fire fox bug repository; automated duplicate bug report classification; common sequence matching; duplicate report detection; open bug tracking repositories; software applications; subsequence matching; word frequency based similarity measures; Bioinformatics; Classification algorithms; Context; Frequency measurement; Software; Software algorithms; Vectors; Documentation; Duplicate Bug Reports; Experimentation; String Algorithms; Verification;
fLanguage
English
Publisher
ieee
Conference_Titel
High-Assurance Systems Engineering (HASE), 2012 IEEE 14th International Symposium on
Conference_Location
Omaha, NE
ISSN
1530-2059
Print_ISBN
978-1-4673-4742-6
Type
conf
DOI
10.1109/HASE.2012.38
Filename
6375640
Link To Document