DocumentCode :
2754643
Title :
Noise Correction using Bayesian Multiple Imputation
Author :
Hulse, Jason Van ; Khoshgoftaar, Taghi M. ; Seiffert, Chris ; Zhao, Lili
Author_Institution :
Dept. of Comput. Sci. & Eng., Florida Atlantic Univ., Boca Raton, FL
fYear :
2006
fDate :
16-18 Sept. 2006
Firstpage :
478
Lastpage :
483
Abstract :
This work presents a novel procedure to detect and correct noise in a continuous dependent variable. The presence of noise in a dataset represents a significant challenge to data mining algorithms, as incorrect values in both the independent and dependent variables can severely corrupt the results of even robust learners. The problem of noise is especially severe when it is located in the dependent variable. In the worst case, severe noise in one of the independent variables can be handled by eliminating that attribute from the dataset, provided that the practitioner knows that noise is present. In the setting of supervised learning, the dependent variable is the most critical attribute in the dataset and therefore cannot be eliminated even if significant noise is present. Noise handling procedures in relation to the dependent variable are therefore absolutely critical to the success of a supervised learning initiative. In contrast to a binary dependent variable or class, noise in a continuous dependent variable presents many additional difficulties. Our procedure to detect and correct noise in a continuous dependent variable uses Bayesian multiple imputation, which was initially developed to combat the problem of missing data. Our case study considers a real-world software measurement dataset called CCCS, which has a numeric dependent variable with inherent noise. The results of our experiments show very encouraging results and clearly demonstrate the utility of our procedure
Keywords :
belief networks; data mining; learning (artificial intelligence); Bayesian multiple imputation; command-control-communication system; data mining; noise correction; software measurement dataset; supervised learning; Bayesian methods; Computer science; Costs; Data mining; Databases; Laboratories; Noise robustness; Software engineering; Software measurement; Supervised learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Reuse and Integration, 2006 IEEE International Conference on
Conference_Location :
Waikoloa Village, HI
Print_ISBN :
0-7803-9788-6
Type :
conf
DOI :
10.1109/IRI.2006.252461
Filename :
4018538
Link To Document :
بازگشت