• DocumentCode
    492604
  • Title

    An approach to detecting duplicate bug reports using natural language and execution information

  • Author

    Wang, Xiaoyin ; Zhang, Lu ; Xie, Tao ; Anvik, John ; Sun, Jiasu

  • Author_Institution
    Inst. of Software, Peking Univ., Beijing
  • fYear
    2008
  • fDate
    10-18 May 2008
  • Firstpage
    461
  • Lastpage
    470
  • Abstract
    An open source project typically maintains an open bug repository so that bug reports from all over the world can be gathered. When a new bug report is submitted to the repository, a person, called a triager, examines whether it is a duplicate of an existing bug report. If it is, the triager marks it as duplicate and the bug report is removed from consideration for further work. In the literature, there are approaches exploiting only natural language information to detect duplicate bug reports. In this paper we present a new approach that further involves execution information. In our approach, when a new bug report arrives, its natural language information and execution information are compared with those of the existing bug reports. Then, a small number of existing bug reports are suggested to the triager as the most similar bug reports to the new bug report. Finally, the triager examines the suggested bug reports to determine whether the new bug report duplicates an existing bug report. We calibrated our approach on a subset of the Eclipse bug repository and evaluated our approach on a subset of the Firefox bug repository. The experimental results show that our approach can detect 67%-93% of duplicate bug reports in the Firefox bug repository, compared to 43%-72% using natural language information alone.
  • Keywords
    information retrieval; natural language processing; public domain software; Eclipse bug repository; execution information; information retrieval; natural language information; triager; Computer bugs; Computer science education; Costs; Educational technology; Laboratories; Natural languages; Open source software; Software maintenance; Software quality; Testing; duplicate bug report; execution information; information retrieval;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Software Engineering, 2008. ICSE '08. ACM/IEEE 30th International Conference on
  • Conference_Location
    Leipzig
  • ISSN
    0270-5257
  • Print_ISBN
    978-1-4244-4486-1
  • Electronic_ISBN
    0270-5257
  • Type

    conf

  • DOI
    10.1145/1368088.1368151
  • Filename
    4814157