DocumentCode :
589292
Title :
A Hybrid Approach to Coping with High Dimensionality and Class Imbalance for Software Defect Prediction
Author :
Kehan Gao ; Khoshgoftaar, Taghi M. ; Napolitano, Antonio
Author_Institution :
Eastern Connecticut State Univ., Willimantic, CT, USA
Volume :
2
fYear :
2012
fDate :
12-15 Dec. 2012
Firstpage :
281
Lastpage :
288
Abstract :
High dimensionality and class imbalance are the two main problems affecting many software defect prediction. In this paper, we propose a new technique, named SelectRUSBoost, which is a form of ensemble learning that in-corporates data sampling to alleviate class imbalance and feature selection to resolve high dimensionality. To evaluate the effectiveness of the new technique, we apply it to a group of datasets in the context of software defect prediction. We employ two classification learners and six feature selection techniques. We compare the technique to the approach where feature selection and data sampling are used together, as well as the case where feature selection is used alone (no sampling used at all). The experimental results demonstrate that the SelectRUSBoost technique is more effective in improving classification performance compared to the other approaches.
Keywords :
data handling; learning (artificial intelligence); software engineering; SelectRUSBoost; class imbalance; data sampling; ensemble learning; high dimensionality; software defect prediction; Boosting; Data models; Measurement; Prediction algorithms; Predictive models; Software; Support vector machines; class imbalance; high dimensionality; software defect prediction;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Applications (ICMLA), 2012 11th International Conference on
Conference_Location :
Boca Raton, FL
Print_ISBN :
978-1-4673-4651-1
Type :
conf
DOI :
10.1109/ICMLA.2012.145
Filename :
6406710
Link To Document :
بازگشت