DocumentCode
3565632
Title
Comparison of distance measures for clustering data with mix attribute types for Indonesian potential-based regional grouping
Author
Prasetyo, Hermawan ; Purwarianti, Ayu
Author_Institution
Sch. of Electr. Eng. & Inf., Inst. Teknol. Bandung, Bandung, Indonesia
fYear
2014
Firstpage
13
Lastpage
18
Abstract
Every region in Indonesia has different potentials and need to be analyzed for national development considerations. This analyzed can be accomplished with clustering Indonesian regional potential data, which is collected from PODES enumeration. This data consist of both numeric and categorical attributes. However, most of clustering algorithm can be applied on either numeric or categorical data. K-prototypes algorithm, as clustering algorithm which can deal with mix data types, has limitation such as distance measurement. Selecting distance measures properly is thus important to increase its performance. This paper presents a comparison of distance measures for clustering mix attribute type data. We have applied k-prototypes algorithm with several distance measures on PODES11-DESA dataset and used Silhouette index for clustering evaluation. The results show that the best clustering is accomplished by applying Ratio on Mismatches distance for categorical attributes. For numeric attributes, there is no one best performing distance measure since the performance of numeric distance measures varies for each treatment.
Keywords
distance measurement; pattern clustering; regional planning; Indonesian potential-based regional grouping; Indonesian regional potential data; PODES enumeration; PODES11-DESA dataset; categorical attribute; clustering algorithm; clustering evaluation; distance measurement; k-prototypes algorithm; mix attribute type data clustering; mix data type; national development consideration; numeric attribute; numeric distance measure; silhouette index; Algorithm design and analysis; Chebyshev approximation; Clustering algorithms; Educational institutions; Indexes; Prototypes; Sociology; clustering mix attribute types; distance measures; k-prototypes algorithm; regional potentials;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Technology Systems and Innovation (ICITSI), 2014 International Conference on
Type
conf
DOI
10.1109/ICITSI.2014.7048230
Filename
7048230
Link To Document