DocumentCode :
3030128
Title :
Applying Novel Resampling Strategies To Software Defect Prediction
Author :
Pelayo, Lourdes ; Dick, Scott
Author_Institution :
Alberta Univ., Edmonton
fYear :
2007
fDate :
24-27 June 2007
Firstpage :
69
Lastpage :
72
Abstract :
Due to the tremendous complexity and sophistication of software, improving software reliability is an enormously difficult task. We study the software defect prediction problem, which focuses on predicting which modules will experience a failure during operation. Numerous studies have applied machine learning to software defect prediction; however, skewness in defect-prediction datasets usually undermines the learning algorithms. The resulting classifiers will often never predict the faulty minority class. This problem is well known in machine learning and is often referred to as learning from unbalanced datasets. We examine stratification, a widely used technique for learning unbalanced data that has received little attention in software defect prediction. Our experiments are focused on the SMOTE technique, which is a method of over-sampling minority-class examples. Our goal is to determine if SMOTE can improve recognition of defect-prone modules, and at what cost. Our experiments demonstrate that after SMOTE resampling, we have a more balanced classification. We found an improvement of at least 23% in the average geometric mean classification accuracy on four benchmark datasets.
Keywords :
learning (artificial intelligence); sampling methods; software metrics; software performance evaluation; software reliability; SMOTE technique; benchmark datasets; defect-prone modules; geometric mean classification accuracy; learning algorithms; machine learning; over-sampling minority-class examples; resampling strategy; software complexity; software defect prediction; software reliability; software sophistication; unbalanced datasets; Computer errors; Costs; Joining processes; Machine learning; Machine learning algorithms; Nearest neighbor searches; Sampling methods; Software algorithms; Software reliability; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Fuzzy Information Processing Society, 2007. NAFIPS '07. Annual Meeting of the North American
Conference_Location :
San Diego, CA
Print_ISBN :
1-4244-1213-7
Electronic_ISBN :
1-4244-1214-5
Type :
conf
DOI :
10.1109/NAFIPS.2007.383813
Filename :
4271036
Link To Document :
بازگشت