• DocumentCode
    1573555
  • Title

    Clustering approaches for data with missing values: Comparison and evaluation

  • Author

    Himmelspach, Ludmila ; Conrad, Stefan

  • Author_Institution
    Inst. of Comput. Sci., Heinrich-Heine-Univ. Dusseldorf, Düsseldorf, Germany
  • fYear
    2010
  • Firstpage
    19
  • Lastpage
    28
  • Abstract
    Traditional clustering methods were developed to analyse complete data sets. Faults during the data collection, data transfer or data cleaning often lead to missing values in data so that common clustering methods can not be used for the data analysis. Therefore, in these cases clustering methods which can handle missing values in data are of great use. In this paper we discuss different approaches proposed in the literature for adapting partitioning clustering algorithms for dealing with missing values in data. We analyse them on two appropriate data sets and compare them with each other. We give particular attention to the analysis of the accuracy of these methods depending on the different missing-data mechanisms and the percentage of missing values in the data sets.
  • Keywords
    data handling; data mining; data cleaning; data clustering; data collection; data missing value determination; data set analysis; data transfer; partitioning clustering algorithms; Accuracy; Clustering algorithms; Clustering methods; Distributed databases; Estimation; Partitioning algorithms; Prototypes;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Digital Information Management (ICDIM), 2010 Fifth International Conference on
  • Conference_Location
    Thunder Bay, ON
  • Print_ISBN
    978-1-4244-7572-8
  • Type

    conf

  • DOI
    10.1109/ICDIM.2010.5664691
  • Filename
    5664691