• DocumentCode
    3580746
  • Title

    Comparison of distance and dissimilarity measures for clustering data with mix attribute types

  • Author

    Prasetyo, Hermawan ; Purwarianti, Ayu

  • Author_Institution
    Sch. of Electr. Eng. & Inf., Inst. Teknol. Bandung, Bandung, Indonesia
  • fYear
    2014
  • Firstpage
    276
  • Lastpage
    280
  • Abstract
    Clustering is one of the most popular methods in data mining. Many algorithms can be applied for data clustering with numeric or categorical attributes. However, most of data in the real world contain both numeric and categorical attributes. A clustering method which can be applied on attributes in mix types become important to handle the problem. K-prototypes algorithm is one of the algorithms which can deal for clustering data with mix attribute types. However, it has a drawback on its dissimilarity measure between categorical data. The selection of proper dissimilarity measure between categorical data is thus important to increase its performance. This paper compares distance and dissimilarity measures for clustering data with mix attribute types. We used the k-prototypes algorithm on UCI datasets, i.e. Echocardiogram, Hepatitis, and Zoo, to assign cluster membership of the objects. Silhouette index was employed to evaluate clustering results. The results show that Euclidean distance and Ratio on Mismatches dissimilarity are the best combination for clustering data with numeric and categorical attribute types, as it shown with average Silhouette index towards 1. As a result, to cluster data with mix attribute types, we propose to employ Euclidean distance and Ratio on Mismatches dissimilarity to be applied on k-prototypes algorithm.
  • Keywords
    category theory; data mining; pattern clustering; Euclidean distance; UCI datasets; categorical data; clustering data; data mining; dissimilarity measures; distance measures; k-prototypes algorithm; silhouette index; clustering mix types data; data mining; distance and dissimilarity measures; k-prototypes algorithm;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Technology, Computer and Electrical Engineering (ICITACEE), 2014 1st International Conference on
  • Print_ISBN
    978-1-4799-6431-4
  • Type

    conf

  • DOI
    10.1109/ICITACEE.2014.7065756
  • Filename
    7065756