DocumentCode
3424497
Title
Identifying noise in an attribute of interest
Author
Khoshgoftaar, Taghi M. ; Van Hulse, Jason
Author_Institution
Florida Atlantic Univ., Boca Raton, FL, USA
fYear
2005
fDate
15-17 Dec. 2005
Abstract
One of the most significant issues facing the data mining community is that of low-quality data. Real-world datasets are often inundated with various types of data integrity issues, particularly noisy data. In response to the difficulties created by low-quality data, we propose a novel technique to detect noisy instances relative to an attribute of interest (AOI). Any attribute in the dataset can be defined by the user as the attribute of interest. A noise ranking of instances relative to the chosen attribute is output. This approach can be iterated for any number of user-specified attributes of interest. The case study described in this work demonstrates how our technique may be used to detect class noise, which occurs when errors are present in the class or dependent variable. In this scenario the class is declared to be the attribute of interest and an instance noise ranking relative to the class is provided. Our technique is compared to the well-known ensemble and classification filters which have been previously proposed for class noise detection. The results of this study demonstrate the effectiveness of our approach and show that our procedure is a useful tool for improving data quality.
Keywords
data integrity; data mining; pattern classification; attribute of interest; class noise detection; classification filters; data cleaning; data integrity; data mining; ensemble filters; instance noise ranking; low-quality data; noisy data; noisy instances; software quality; Algorithm design and analysis; Classification algorithms; Cleaning; Computer science; Costs; Data engineering; Data mining; Filters; Software engineering; Software quality;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Applications, 2005. Proceedings. Fourth International Conference on
Print_ISBN
0-7695-2495-8
Type
conf
DOI
10.1109/ICMLA.2005.39
Filename
1607431
Link To Document