• DocumentCode
    3424497
  • Title

    Identifying noise in an attribute of interest

  • Author

    Khoshgoftaar, Taghi M. ; Van Hulse, Jason

  • Author_Institution
    Florida Atlantic Univ., Boca Raton, FL, USA
  • fYear
    2005
  • fDate
    15-17 Dec. 2005
  • Abstract
    One of the most significant issues facing the data mining community is that of low-quality data. Real-world datasets are often inundated with various types of data integrity issues, particularly noisy data. In response to the difficulties created by low-quality data, we propose a novel technique to detect noisy instances relative to an attribute of interest (AOI). Any attribute in the dataset can be defined by the user as the attribute of interest. A noise ranking of instances relative to the chosen attribute is output. This approach can be iterated for any number of user-specified attributes of interest. The case study described in this work demonstrates how our technique may be used to detect class noise, which occurs when errors are present in the class or dependent variable. In this scenario the class is declared to be the attribute of interest and an instance noise ranking relative to the class is provided. Our technique is compared to the well-known ensemble and classification filters which have been previously proposed for class noise detection. The results of this study demonstrate the effectiveness of our approach and show that our procedure is a useful tool for improving data quality.
  • Keywords
    data integrity; data mining; pattern classification; attribute of interest; class noise detection; classification filters; data cleaning; data integrity; data mining; ensemble filters; instance noise ranking; low-quality data; noisy data; noisy instances; software quality; Algorithm design and analysis; Classification algorithms; Cleaning; Computer science; Costs; Data engineering; Data mining; Filters; Software engineering; Software quality;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Applications, 2005. Proceedings. Fourth International Conference on
  • Print_ISBN
    0-7695-2495-8
  • Type

    conf

  • DOI
    10.1109/ICMLA.2005.39
  • Filename
    1607431