Title :
On the Impact of Dissimilarity Measure in k-Modes Clustering Algorithm
Author :
Ng, Michael K. ; Li, Mark Junjie ; Huang, Joshua Zhexue ; He, Zengyou
Author_Institution :
Dept. of Math., Hong Kong Baptist Univ., Kowloon
fDate :
3/1/2007 12:00:00 AM
Abstract :
This correspondence describes extensions to the k-modes algorithm for clustering categorical data. By modifying a simple matching dissimilarity measure for categorical objects, a heuristic approach was developed in (Z. He, et al., 2005), (O. San, et al., 2004) which allows the use of the k-modes paradigm to obtain a cluster with strong intrasimilarity and to efficiently cluster large categorical data sets. The main aim of this paper is to rigorously derive the updating formula of the k-modes clustering algorithm with the new dissimilarity measure and the convergence of the algorithm under the optimization framework
Keywords :
data analysis; data mining; pattern clustering; categorical data; data clustering; data mining; k-modes clustering algorithm; matching dissimilarity measure; Algorithm design and analysis; Clustering algorithms; Convergence; Cost function; Database systems; Frequency measurement; Helium; Data mining; categorical data.; clustering; k-modes algorithm; Algorithms; Artifacts; Artificial Intelligence; Cluster Analysis; Information Storage and Retrieval; Numerical Analysis, Computer-Assisted; Pattern Recognition, Automated; Reproducibility of Results; Sensitivity and Specificity;
Journal_Title :
Pattern Analysis and Machine Intelligence, IEEE Transactions on
DOI :
10.1109/TPAMI.2007.53