• DocumentCode
    1900464
  • Title

    Duplicate bug report detection with a combination of information retrieval and topic modeling

  • Author

    Anh Tuan Nguyen ; Tung Thanh Nguyen ; Nguyen, Tuan N. ; Lo, Daniel ; Chengnian Sun

  • Author_Institution
    Iowa State Univ., Ames, IA, USA
  • fYear
    2012
  • fDate
    3-7 Sept. 2012
  • Firstpage
    70
  • Lastpage
    79
  • Abstract
    Detecting duplicate bug reports helps reduce triaging efforts and save time for developers in fixing the same issues. Among several automated detection approaches, text-based information retrieval (IR) approaches have been shown to outperform others in term of both accuracy and time efficiency. However, those IR-based approaches do not detect well the duplicate reports on the same technical issues written in different descriptive terms. This paper introduces DBTM, a duplicate bug report detection approach that takes advantage of both IR-based features and topic-based features. DBTM models a bug report as a textual document describing certain technical issue(s), and models duplicate bug reports as the ones about the same technical issue(s). Trained with historical data including identified duplicate reports, it is able to learn the sets of different terms describing the same technical issues and to detect other not-yet-identified duplicate ones. Our empirical evaluation on real-world systems shows that DBTM improves the state-of-the-art approaches by up to 20% in accuracy.
  • Keywords
    information retrieval; program debugging; text analysis; DBTM approach; IR approach; IR-based feature; duplicate bug report detection; text-based information retrieval; textual document; topic modeling; topic-based feature; Duplicate Bug Reports; Information Retrieval; Topic Model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Automated Software Engineering (ASE), 2012 Proceedings of the 27th IEEE/ACM International Conference on
  • Conference_Location
    Essen
  • Print_ISBN
    978-1-4503-1204-2
  • Type

    conf

  • DOI
    10.1145/2351676.2351687
  • Filename
    6494907