• DocumentCode
    2400983
  • Title

    Explaining software defects using topic models

  • Author

    Chen, Tse-Hsun ; Thomas, Stephen W. ; Nagappan, Meiyappan ; Hassan, Ahmed E.

  • Author_Institution
    Software Anal. & Intell. Lab. (SAIL), Queen´´s Univ., Kingston, ON, Canada
  • fYear
    2012
  • fDate
    2-3 June 2012
  • Firstpage
    189
  • Lastpage
    198
  • Abstract
    Researchers have proposed various metrics based on measurable aspects of the source code entities (e.g., methods, classes, files, or modules) and the social structure of a software project in an effort to explain the relationships between software development and software defects. However, these metrics largely ignore the actual functionality, i.e., the conceptual concerns, of a software system, which are the main technical concepts that reflect the business logic or domain of the system. For instance, while lines of code may be a good general measure for defects, a large entity responsible for simple I/O tasks is likely to have fewer defects than a small entity responsible for complicated compiler implementation details. In this paper, we study the effect of conceptual concerns on code quality. We use a statistical topic modeling technique to approximate software concerns as topics; we then propose various metrics on these topics to help explain the defect-proneness (i.e., quality) of the entities. Paramount to our proposed metrics is that they take into account the defect history of each topic. Case studies on multiple versions of Mozilla Firefox, Eclipse, and Mylyn show that (i) some topics are much more defect-prone than others, (ii) defect-prone topics tend to remain so over time, and (iii) defect-prone topics provide additional explanatory power for code quality over existing structural and historical metrics.
  • Keywords
    software fault tolerance; software metrics; software quality; statistical analysis; Eclipse; I/O tasks; Mozilla Firefox; Mylyn; business logic; code lines; code quality; defect-proneness; historical metrics; software defects; software development; software project; source code entities; statistical topic modeling technique; structural metrics; Correlation; Fires; History; Java; Measurement; Software systems; code quality; software concerns; topic modeling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Mining Software Repositories (MSR), 2012 9th IEEE Working Conference on
  • Conference_Location
    Zurich
  • ISSN
    2160-1852
  • Print_ISBN
    978-1-4673-1760-3
  • Type

    conf

  • DOI
    10.1109/MSR.2012.6224280
  • Filename
    6224280