Title :
Using search-based metric selection and oversampling to predict fault prone modules
Author :
Vivanco, R. ; Kamei, Y. ; Monden, A. ; Matsumoto, K. ; Jin, D.
Author_Institution :
Dept. of Comput. Sci., Univ. of Manitoba, Winnipeg, MB, Canada
Abstract :
Predictive models can be used in the detection of fault prone modules using source code metrics as inputs for the classifier. However, there exist numerous structural measures that capture different aspects of size, coupling and complexity. Identifying a metric subset that enhances the performance for the predictive objective would not only improve the model but also provide insights into the structural properties that lead to problematic modules. Another difficulty in building predictive models comes from unbalanced datasets, which are common in empirical software engineering as a majority of the modules are not likely to be faulty. Oversampling attempts to overcome this deficiency by generating new training instances from the faulty modules. We present the results of applying search-based metric selection and oversampling to three NASA datasets. For these datasets, oversampling results in the largest improvement. Metric subset selection was able to reduce up to 52% of the metrics without decreasing the predictive performance gained with oversampling.
Keywords :
genetic algorithms; program diagnostics; sampling methods; search problems; software metrics; software quality; NASA datasets; fault prone modules prediction; oversampling; predictive models; search based metric selection; software engineering; source code metrics; Accuracy; Classification algorithms; Measurement; Predictive models; Search problems; Software; Training; Dataset Oversampling; Genetic Algorithm; Software Quality Models; Source Code Metrics;
Conference_Titel :
Electrical and Computer Engineering (CCECE), 2010 23rd Canadian Conference on
Conference_Location :
Calgary, AB
Print_ISBN :
978-1-4244-5376-4
Electronic_ISBN :
0840-7789
DOI :
10.1109/CCECE.2010.5575249