Title :
An Empirical Study of the Classification Performance of Learners on Imbalanced and Noisy Software Quality Data
Author :
Seiffert, Chris ; Khoshgoftaar, Taghi M. ; Van Hulse, Jason ; Folleco, Andres
Author_Institution :
Florida Atlantic Univ., Boca Raton
Abstract :
In the domain of software quality classification, data mining techniques are used to construct models (learners) for identifying software modules that are most likely to be fault-prone. The performance of these models, however, can be negatively affected by class imbalance and noise. Data sampling techniques have been proposed to alleviate the problem of class imbalance, but the impact of data quality on these techniques has not been adequately addressed. We examine the combined effects of noise and imbalance on classification performance when seven commonly-used sampling techniques are applied to software quality measurement data. Our results show that some sampling techniques are more robust in the presence of noise than others. Further, sampling techniques are affected by noise differently given different levels of imbalance.
Keywords :
data mining; sampling methods; software quality; data mining techniques; data sampling techniques; software quality data; Data mining; Fault diagnosis; Noise level; Noise measurement; Sampling methods; Software engineering; Software measurement; Software performance; Software quality; Software systems;
Conference_Titel :
Information Reuse and Integration, 2007. IRI 2007. IEEE International Conference on
Conference_Location :
Las Vegas, IL
Print_ISBN :
1-4244-1500-4
Electronic_ISBN :
1-4244-1500-4
DOI :
10.1109/IRI.2007.4296694