• DocumentCode
    2439970
  • Title

    Automated Duplicate Bug Report Classification Using Subsequence Matching

  • Author

    Banerjee, Sean ; Cukic, Bojan ; Adjeroh, Donald

  • Author_Institution
    Lane Dept. of Comput. Sci. & Electr. Eng., West Virginia Univ., Morgantown, WV, USA
  • fYear
    2012
  • fDate
    25-27 Oct. 2012
  • Firstpage
    74
  • Lastpage
    81
  • Abstract
    The use of open bug tracking repositories like Bugzilla is common in many software applications. They allow developers, testers and users the ability to report problems associated with the system and track resolution status. Open and democratic reporting tools, however, face one major challenge: users can, and often do, submit reports describing the same problem. Research in duplicate report detection has primarily focused on word frequency based similarity measures paying little regard to the context or structure of the reporting language. Thus, in large repositories, reports describing different issues may be marked as duplicates due to the frequent use of common words. In this paper, we present Factor LCS, a methodology which utilizes common sequence matching for duplicate report detection. We demonstrate the approach by analyzing the complete Fire fox bug repository up until March 2012 as well as a smaller subset of Eclipse dataset from January 1, 2008 to December 31, 2008. We achieve a duplicate recall rate above 70% with Fire fox, which exceeds the results reported on smaller subsets of the same repository.
  • Keywords
    formal verification; program debugging; software engineering; Bugzilla; Eclipse dataset; Factor LCS; Fire fox bug repository; automated duplicate bug report classification; common sequence matching; duplicate report detection; open bug tracking repositories; software applications; subsequence matching; word frequency based similarity measures; Bioinformatics; Classification algorithms; Context; Frequency measurement; Software; Software algorithms; Vectors; Documentation; Duplicate Bug Reports; Experimentation; String Algorithms; Verification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High-Assurance Systems Engineering (HASE), 2012 IEEE 14th International Symposium on
  • Conference_Location
    Omaha, NE
  • ISSN
    1530-2059
  • Print_ISBN
    978-1-4673-4742-6
  • Type

    conf

  • DOI
    10.1109/HASE.2012.38
  • Filename
    6375640