DocumentCode :
3060883
Title :
Predicting Faults in High Assurance Software
Author :
Seliya, Naeem ; Khoshgoftaar, Taghi M. ; Van Hulse, Jason
Author_Institution :
Comput. & Inf. Sci. Dept., Univ. of Michigan-Dearborn, Dearborn, MI, USA
fYear :
2010
fDate :
3-4 Nov. 2010
Firstpage :
26
Lastpage :
34
Abstract :
Reducing the number of latent software defects is a development goal that is particularly applicable to high assurance software systems. For such systems, the software measurement and defect data is highly skewed toward the not-fault-prone program modules, i.e., the number of fault-prone modules is relatively very small. The skewed data problem, also known as class imbalance, poses a unique challenge when training a software quality estimation model. However, practitioners and researchers often build defect prediction models without regard to the skewed data problem. In high assurance systems, the class imbalance problem must be addressed when building defect predictors. This study investigates the roughly balanced bagging (RBBag) algorithm for building software quality models with data sets that suffer from class imbalance. The algorithm combines bagging and data sampling into one technique. A case study of 15 software measurement data sets from different real-world high assurance systems is used in our investigation of the RBBag algorithm. Two commonly used classification algorithms in the software engineering domain, Naive Bayes and C4.5 decision tree, are combined with RBBag for building the software quality models. The results demonstrate that defect prediction models based on the RBBag algorithm significantly outperform models built without any bagging or data sampling. The RBBag algorithm provides the analyst with a tool for effectively addressing class imbalance when training defect predictors during high assurance software development.
Keywords :
Bayes methods; decision trees; software fault tolerance; software metrics; software quality; C4.5 decision tree; class imbalance problem; classification algorithm; defect data; defect prediction model; fault prediction; high assurance software systems; naive Bayes; not-fault-prone program modules; roughly balanced bagging; software defects; software engineering domain; software measurement; software quality estimation model; software quality models; Bagging; Data models; Prediction algorithms; Predictive models; Software; Software measurement; bagging; classification; data sampling; defect prediction; imbalanced data; software measurements;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High-Assurance Systems Engineering (HASE), 2010 IEEE 12th International Symposium on
Conference_Location :
San Jose, CA
ISSN :
1530-2059
Print_ISBN :
978-1-4244-9091-2
Electronic_ISBN :
1530-2059
Type :
conf
DOI :
10.1109/HASE.2010.29
Filename :
5634306
Link To Document :
بازگشت