DocumentCode :
754987
Title :
Mining With Noise Knowledge: Error-Aware Data Mining
Author :
Wu, Xindong ; Zhu, Xingquan
Author_Institution :
Sch. of Comput. Sci. & inf. Eng., Hefei Univ. of Technol., Hefei
Volume :
38
Issue :
4
fYear :
2008
fDate :
7/1/2008 12:00:00 AM
Firstpage :
917
Lastpage :
932
Abstract :
Real-world data mining deals with noisy information sources where data collection inaccuracy, device limitations, data transmission and discretization errors, or man-made perturbations frequently result in imprecise or vague data. Two common practices are to adopt either data cleansing approaches to enhance the data consistency or simply take noisy data as quality sources and feed them into the data mining algorithms. Either way may substantially sacrifice the mining performance. In this paper, we consider an error-aware (EA) data mining design, which takes advantage of statistical error information (such as noise level and noise distribution) to improve data mining results. We assume that such noise knowledge is available in advance, and we propose a solution to incorporate it into the mining process. More specifically, we use noise knowledge to restore original data distributions, which are further used to rectify the model built from noise- corrupted data. We materialize this concept by the proposed EA naive Bayes classification algorithm. Experimental comparisons on real-world datasets will demonstrate the effectiveness of this design.
Keywords :
Bayes methods; data mining; error statistics; noise; pattern classification; error-aware data mining; man-made perturbation; naive Bayes classification algorithm; noise knowledge; noise-corrupted data; statistical error information; Classification algorithms; Computer science; Costs; Data communication; Data mining; Decision theory; Feeds; Mining industry; Niobium; Noise level; Classification; data mining; naive Bayes (NB); noise handling; noise knowledge;
fLanguage :
English
Journal_Title :
Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on
Publisher :
ieee
ISSN :
1083-4427
Type :
jour
DOI :
10.1109/TSMCA.2008.923034
Filename :
4544889
Link To Document :
بازگشت