Title :
Automated Duplicate Bug Report Classification Using Subsequence Matching
Author :
Banerjee, Sean ; Cukic, Bojan ; Adjeroh, Donald
Author_Institution :
Lane Dept. of Comput. Sci. & Electr. Eng., West Virginia Univ., Morgantown, WV, USA
Abstract :
The use of open bug tracking repositories like Bugzilla is common in many software applications. They allow developers, testers and users the ability to report problems associated with the system and track resolution status. Open and democratic reporting tools, however, face one major challenge: users can, and often do, submit reports describing the same problem. Research in duplicate report detection has primarily focused on word frequency based similarity measures paying little regard to the context or structure of the reporting language. Thus, in large repositories, reports describing different issues may be marked as duplicates due to the frequent use of common words. In this paper, we present Factor LCS, a methodology which utilizes common sequence matching for duplicate report detection. We demonstrate the approach by analyzing the complete Fire fox bug repository up until March 2012 as well as a smaller subset of Eclipse dataset from January 1, 2008 to December 31, 2008. We achieve a duplicate recall rate above 70% with Fire fox, which exceeds the results reported on smaller subsets of the same repository.
Keywords :
formal verification; program debugging; software engineering; Bugzilla; Eclipse dataset; Factor LCS; Fire fox bug repository; automated duplicate bug report classification; common sequence matching; duplicate report detection; open bug tracking repositories; software applications; subsequence matching; word frequency based similarity measures; Bioinformatics; Classification algorithms; Context; Frequency measurement; Software; Software algorithms; Vectors; Documentation; Duplicate Bug Reports; Experimentation; String Algorithms; Verification;
Conference_Titel :
High-Assurance Systems Engineering (HASE), 2012 IEEE 14th International Symposium on
Conference_Location :
Omaha, NE
Print_ISBN :
978-1-4673-4742-6
DOI :
10.1109/HASE.2012.38