Title of article :
Evaluation of Classifiers in Software Fault-Proneness Prediction
Author/Authors :
Karimian ، F. - University of Kashan , Babamir ، S. M. - University of Kashan
Pages :
19
From page :
149
To page :
167
Abstract :
Reliability of a software counts on its fault-prone modules. This means that the less the software consists of fault-prone units, the more we may trust it. Therefore, if we are able to predict the number of fault-prone modules of a software, it will be possible to judge its reliability. In predicting the software fault-prone modules, one of the contributing features is software metric, by which one can classify he software modules into the fault-prone and non-fault-prone ones. To make such a classification, we investigated 17 classifier methods, whose features (attributes) were software metrics (39 metrics), and the mining instances (software modules) were 13 datasets reported by NASA. However, there are two important issues influencing our prediction accuracy when we use data mining methods: (1) selecting the best/most influential features (i.e. software metrics) when there is a wide diversity of them, and (2) instance sampling in order to balance the imbalanced instances of mining; we have two imbalanced classes when the classifier biases towards the majority class. Based on the feature selection and instance sampling, we considered 4 scenarios in appraisal of 17 classifier methods to predict software faultprone modules. To select features, we used correlation-based feature selection (CFS), and to sample instances, we implemented the synthetic minority oversampling technique (SMOTE).The empirical results obtained show that suitable sampling software modules significantly influences the accuracy of predicting software reliability but metric selection does not have a considerable effect on the prediction. Furthermore, among the other data classifiers, bagging, K*, and random forest are the best ones when we use the sampled instances for training data.
Keywords :
Software Fault Prediction , Classifier Performance , Feature Selection , Data Sampling , Software Metric , Dependent Variable , Independent Variable
Journal title :
Journal of Artificial Intelligence Data Mining
Serial Year :
2017
Journal title :
Journal of Artificial Intelligence Data Mining
Record number :
2449370
Link To Document :
بازگشت