DocumentCode :
3207195
Title :
Rule-based noise detection for software measurement data
Author :
Khoshgoftaar, Taghi M. ; Seliya, Naeem ; Gao, Kehan
Author_Institution :
Dept. of Comput. Sci. & Eng., Florida Atlantic Univ., Boca Raton, FL, USA
fYear :
2004
fDate :
8-10 Nov. 2004
Firstpage :
302
Lastpage :
307
Abstract :
The quality of training data is an important issue for classification problems, such as classifying program modules into the fault-prone and not fault-prone groups. The removal of noisy instances will improve data quality, and consequently, performance of the classification model. We present an attractive rule-based noise detection approach, which detects noisy instances based on Boolean rules generated from the measurement data. The proposed approach is evaluated by injecting artificial noise into a clean or noise-free software measurement dataset. The clean dataset is extracted from software measurement data of a NASA software project developed for realtime predictions. The simulated noise is injected into the attributes of the dataset at different noise levels. The number of attributes subjected to noise is also varied for the given dataset. We compare our approach to a classification filter, which considers and eliminates misclassified instances as noisy data. It is shown that for the different noise levels, the proposed approach has better efficiency in detecting noisy instances than the C4.5-based classification filter. In addition, the noise detection performance of our approach increases very rapidly with an increase in the number of attributes corrupted.
Keywords :
knowledge based systems; software metrics; software quality; Boolean rules generation; C4.5-based classification filter; NASA software project development; data quality; noise-free software measurement dataset; realtime prediction; rule-based noise detection; Data mining; Filtering; Filters; Labeling; NASA; Noise generators; Noise level; Noise measurement; Software measurement; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Reuse and Integration, 2004. IRI 2004. Proceedings of the 2004 IEEE International Conference on
Print_ISBN :
0-7803-8819-4
Type :
conf
DOI :
10.1109/IRI.2004.1431478
Filename :
1431478
Link To Document :
بازگشت