DocumentCode
2350091
Title
Identifying learners robust to low quality data
Author
Folleco, Andres ; Khoshgoftaar, Taghi M. ; Van Hulse, Jason ; Bullard, Lofton
Author_Institution
Florida Atlantic University, Boca Raton, USA
fYear
2008
fDate
13-15 July 2008
Firstpage
190
Lastpage
195
Abstract
Real world datasets commonly contain noise that is distributed in both the independent and dependent variables. Noise, which typically consists of erroneous variable values, has been shown to significantly affect the classification performance of learners. In this study, we identify learners with robust performance in the presence of low quality (noisy) measurement data. Noise was injected into five class imbalanced software engineering measurement datasets, initially relatively free of noise. The experimental factors considered included the learner used, the level of injected noise, the dataset used (each with unique properties), and the percentage of minority instances containing noise. No other related studies were found that have identified learners that are robust in the presence of low quality measurement data. Based on the results of this study, we recommend using the random forest learner for building classification models from noisy data.
Keywords
Data mining; Decision trees; Machine learning; Noise level; Noise measurement; Noise robustness; Software measurement; Support vector machine classification; Support vector machines; Working environment noise; learning performance; quality of data; random forest; software measurement data;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Reuse and Integration, 2008. IRI 2008. IEEE International Conference on
Conference_Location
Las Vegas, NV, USA
Print_ISBN
978-1-4244-2659-1
Electronic_ISBN
978-1-4244-2660-7
Type
conf
DOI
10.1109/IRI.2008.4583028
Filename
4583028
Link To Document