Title :
Estimation of number of clusters in categorical data via distance-based likelihood function
Author :
Peng Zhang ; Yaolong Feng ; Xiaogang Wang
Author_Institution :
Dept. of Math. & Stat. Sci., Univ. of Alberta, Edmonton, AB, Canada
Abstract :
We propose a new approach to selecting the number of clusters for categorical data via the likelihood function based on Hamming distances. Properties of the random variable of the distance of categorical data and the maximum likelihood estimators are discussed. An expected maximized log-likelihood function on data of a unique cluster is computed using simulated data. Changes in the maximized log-likelihood functions with respect to different numbers of clusters are compared with the thresholds obtained from the expected counterparts. The estimated number of clusters is chosen to be the first integer that the former change is no more significantly larger than the latter change. Simulation studies are carried out to examine the accuracy of the proposed method. We also give an example of real data analysis in the paper.
Keywords :
category theory; data analysis; maximum likelihood estimation; pattern clustering; random processes; Hamming distances; categorical data analysis; cluster analysis; maximum likelihood estimation; maximum log likelihood function; random variable; Approximation methods; Clustering algorithms; Data models; Educational institutions; Hamming distance; Mathematical model; Maximum likelihood estimation;
Conference_Titel :
Natural Computation (ICNC), 2011 Seventh International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4244-9950-2
DOI :
10.1109/ICNC.2011.6022590