DocumentCode
2400983
Title
Explaining software defects using topic models
Author
Chen, Tse-Hsun ; Thomas, Stephen W. ; Nagappan, Meiyappan ; Hassan, Ahmed E.
Author_Institution
Software Anal. & Intell. Lab. (SAIL), Queen´´s Univ., Kingston, ON, Canada
fYear
2012
fDate
2-3 June 2012
Firstpage
189
Lastpage
198
Abstract
Researchers have proposed various metrics based on measurable aspects of the source code entities (e.g., methods, classes, files, or modules) and the social structure of a software project in an effort to explain the relationships between software development and software defects. However, these metrics largely ignore the actual functionality, i.e., the conceptual concerns, of a software system, which are the main technical concepts that reflect the business logic or domain of the system. For instance, while lines of code may be a good general measure for defects, a large entity responsible for simple I/O tasks is likely to have fewer defects than a small entity responsible for complicated compiler implementation details. In this paper, we study the effect of conceptual concerns on code quality. We use a statistical topic modeling technique to approximate software concerns as topics; we then propose various metrics on these topics to help explain the defect-proneness (i.e., quality) of the entities. Paramount to our proposed metrics is that they take into account the defect history of each topic. Case studies on multiple versions of Mozilla Firefox, Eclipse, and Mylyn show that (i) some topics are much more defect-prone than others, (ii) defect-prone topics tend to remain so over time, and (iii) defect-prone topics provide additional explanatory power for code quality over existing structural and historical metrics.
Keywords
software fault tolerance; software metrics; software quality; statistical analysis; Eclipse; I/O tasks; Mozilla Firefox; Mylyn; business logic; code lines; code quality; defect-proneness; historical metrics; software defects; software development; software project; source code entities; statistical topic modeling technique; structural metrics; Correlation; Fires; History; Java; Measurement; Software systems; code quality; software concerns; topic modeling;
fLanguage
English
Publisher
ieee
Conference_Titel
Mining Software Repositories (MSR), 2012 9th IEEE Working Conference on
Conference_Location
Zurich
ISSN
2160-1852
Print_ISBN
978-1-4673-1760-3
Type
conf
DOI
10.1109/MSR.2012.6224280
Filename
6224280
Link To Document