DocumentCode :
1575540
Title :
The Effects of Over and Under Sampling on Fault-prone Module Detection
Author :
Kamei, Yasutaka ; Monden, Akito ; Matsumoto, Shinsuke ; Kakimoto, Takeshi ; Matsumoto, Ken-ichi
Author_Institution :
Nara Inst. of Sci. & Technol., Nara
fYear :
2007
Firstpage :
196
Lastpage :
204
Abstract :
The goal of this paper is to improve the prediction performance of fault-prone module prediction models (fault-proneness models) by employing over/under sampling methods, which are preprocessing procedures for a fit dataset. The sampling methods are expected to improve prediction performance when the fit dataset is unbalanced, i.e. there exists a large difference between the number of fault-prone modules and not-fault-prone modules. So far, there has been no research reporting the effects of applying sampling methods to fault-proneness models. In this paper, we experimentally evaluated the effects of four sampling methods (random over sampling, synthetic minority over sampling, random under sampling and one-sided selection) applied to four fault-proneness models (linear discriminant analysis, logistic regression analysis, neural network and classification tree) by using two module sets of industry legacy software. All four sampling methods improved the prediction performance of the linear and logistic models, while neural network and classification tree models did not benefit from the sampling methods. The improvements of Fl-values in linear and logistic models were 0.078 at minimum, 0.224 at maximum and 0.121 at the mean.
Keywords :
program testing; sampling methods; software maintenance; classification tree; fault-prone module detection; fault-prone module prediction models; industry legacy software; linear discriminant analysis; logistic regression analysis; neural network; sampling methods; Accuracy; Classification tree analysis; Fault detection; Linear discriminant analysis; Logistics; Neural networks; Predictive models; Regression analysis; Sampling methods; Software engineering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Empirical Software Engineering and Measurement, 2007. ESEM 2007. First International Symposium on
Conference_Location :
Madrid
ISSN :
1938-6451
Print_ISBN :
978-0-7695-2886-1
Type :
conf
DOI :
10.1109/ESEM.2007.28
Filename :
4343747
Link To Document :
بازگشت