Title :
Identifying noise in an attribute of interest
Author :
Khoshgoftaar, Taghi M. ; Van Hulse, Jason
Author_Institution :
Florida Atlantic Univ., Boca Raton, FL, USA
Abstract :
One of the most significant issues facing the data mining community is that of low-quality data. Real-world datasets are often inundated with various types of data integrity issues, particularly noisy data. In response to the difficulties created by low-quality data, we propose a novel technique to detect noisy instances relative to an attribute of interest (AOI). Any attribute in the dataset can be defined by the user as the attribute of interest. A noise ranking of instances relative to the chosen attribute is output. This approach can be iterated for any number of user-specified attributes of interest. The case study described in this work demonstrates how our technique may be used to detect class noise, which occurs when errors are present in the class or dependent variable. In this scenario the class is declared to be the attribute of interest and an instance noise ranking relative to the class is provided. Our technique is compared to the well-known ensemble and classification filters which have been previously proposed for class noise detection. The results of this study demonstrate the effectiveness of our approach and show that our procedure is a useful tool for improving data quality.
Keywords :
data integrity; data mining; pattern classification; attribute of interest; class noise detection; classification filters; data cleaning; data integrity; data mining; ensemble filters; instance noise ranking; low-quality data; noisy data; noisy instances; software quality; Algorithm design and analysis; Classification algorithms; Cleaning; Computer science; Costs; Data engineering; Data mining; Filters; Software engineering; Software quality;
Conference_Titel :
Machine Learning and Applications, 2005. Proceedings. Fourth International Conference on
Print_ISBN :
0-7695-2495-8
DOI :
10.1109/ICMLA.2005.39