• DocumentCode
    1967450
  • Title

    Handling missing values via decomposition of the conditioned set

  • Author

    Shyu, Mei-Ling ; Kuruppu-Appuhamilage, Indika Priyantha ; Chen, Shu-Ching ; Chang, LiWu

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Miami Univ., Coral Gables, FL, USA
  • fYear
    2005
  • fDate
    15-17 Aug. 2005
  • Firstpage
    199
  • Lastpage
    204
  • Abstract
    In this paper, a framework for replacing missing values in a database is proposed since a real-world database is seldom complete. Good data quality in a database can directly improve the performance of any data mining algorithm in various applications. Our proposed framework adopts the basic concepts from conditional probability theories and further develops an algorithm to facilitate the capability of handling both nominal and numerical values, which addresses the problem of the inability of handling both nominal and numerical values with a high degree of accuracy in the existing algorithms. Several experiments are conducted and the experimental results demonstrate that our framework provides a high accuracy when compared with most of the commonly used algorithms such as using the average value, using the maximum value, and using the minimum value to replace missing values.
  • Keywords
    data mining; database management systems; probability; conditional probability theory; conditioned set decomposition; data mining algorithm; data quality; missing values handling; real-world database; Cleaning; Computer science; Data mining; Data preprocessing; Distributed computing; Distributed databases; Information systems; Laboratories; Multimedia databases; Multimedia systems;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Reuse and Integration, Conf, 2005. IRI -2005 IEEE International Conference on.
  • Print_ISBN
    0-7803-9093-8
  • Type

    conf

  • DOI
    10.1109/IRI-05.2005.1506473
  • Filename
    1506473