DocumentCode
3580746
Title
Comparison of distance and dissimilarity measures for clustering data with mix attribute types
Author
Prasetyo, Hermawan ; Purwarianti, Ayu
Author_Institution
Sch. of Electr. Eng. & Inf., Inst. Teknol. Bandung, Bandung, Indonesia
fYear
2014
Firstpage
276
Lastpage
280
Abstract
Clustering is one of the most popular methods in data mining. Many algorithms can be applied for data clustering with numeric or categorical attributes. However, most of data in the real world contain both numeric and categorical attributes. A clustering method which can be applied on attributes in mix types become important to handle the problem. K-prototypes algorithm is one of the algorithms which can deal for clustering data with mix attribute types. However, it has a drawback on its dissimilarity measure between categorical data. The selection of proper dissimilarity measure between categorical data is thus important to increase its performance. This paper compares distance and dissimilarity measures for clustering data with mix attribute types. We used the k-prototypes algorithm on UCI datasets, i.e. Echocardiogram, Hepatitis, and Zoo, to assign cluster membership of the objects. Silhouette index was employed to evaluate clustering results. The results show that Euclidean distance and Ratio on Mismatches dissimilarity are the best combination for clustering data with numeric and categorical attribute types, as it shown with average Silhouette index towards 1. As a result, to cluster data with mix attribute types, we propose to employ Euclidean distance and Ratio on Mismatches dissimilarity to be applied on k-prototypes algorithm.
Keywords
category theory; data mining; pattern clustering; Euclidean distance; UCI datasets; categorical data; clustering data; data mining; dissimilarity measures; distance measures; k-prototypes algorithm; silhouette index; clustering mix types data; data mining; distance and dissimilarity measures; k-prototypes algorithm;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Technology, Computer and Electrical Engineering (ICITACEE), 2014 1st International Conference on
Print_ISBN
978-1-4799-6431-4
Type
conf
DOI
10.1109/ICITACEE.2014.7065756
Filename
7065756
Link To Document