Title :
Comparison of distance and dissimilarity measures for clustering data with mix attribute types
Author :
Prasetyo, Hermawan ; Purwarianti, Ayu
Author_Institution :
Sch. of Electr. Eng. & Inf., Inst. Teknol. Bandung, Bandung, Indonesia
Abstract :
Clustering is one of the most popular methods in data mining. Many algorithms can be applied for data clustering with numeric or categorical attributes. However, most of data in the real world contain both numeric and categorical attributes. A clustering method which can be applied on attributes in mix types become important to handle the problem. K-prototypes algorithm is one of the algorithms which can deal for clustering data with mix attribute types. However, it has a drawback on its dissimilarity measure between categorical data. The selection of proper dissimilarity measure between categorical data is thus important to increase its performance. This paper compares distance and dissimilarity measures for clustering data with mix attribute types. We used the k-prototypes algorithm on UCI datasets, i.e. Echocardiogram, Hepatitis, and Zoo, to assign cluster membership of the objects. Silhouette index was employed to evaluate clustering results. The results show that Euclidean distance and Ratio on Mismatches dissimilarity are the best combination for clustering data with numeric and categorical attribute types, as it shown with average Silhouette index towards 1. As a result, to cluster data with mix attribute types, we propose to employ Euclidean distance and Ratio on Mismatches dissimilarity to be applied on k-prototypes algorithm.
Keywords :
category theory; data mining; pattern clustering; Euclidean distance; UCI datasets; categorical data; clustering data; data mining; dissimilarity measures; distance measures; k-prototypes algorithm; silhouette index; clustering mix types data; data mining; distance and dissimilarity measures; k-prototypes algorithm;
Conference_Titel :
Information Technology, Computer and Electrical Engineering (ICITACEE), 2014 1st International Conference on
Print_ISBN :
978-1-4799-6431-4
DOI :
10.1109/ICITACEE.2014.7065756