DocumentCode :
3350319
Title :
Estimation of number of clusters in categorical data via distance-based likelihood function
Author :
Peng Zhang ; Yaolong Feng ; Xiaogang Wang
Author_Institution :
Dept. of Math. & Stat. Sci., Univ. of Alberta, Edmonton, AB, Canada
Volume :
4
fYear :
2011
fDate :
26-28 July 2011
Firstpage :
2396
Lastpage :
2400
Abstract :
We propose a new approach to selecting the number of clusters for categorical data via the likelihood function based on Hamming distances. Properties of the random variable of the distance of categorical data and the maximum likelihood estimators are discussed. An expected maximized log-likelihood function on data of a unique cluster is computed using simulated data. Changes in the maximized log-likelihood functions with respect to different numbers of clusters are compared with the thresholds obtained from the expected counterparts. The estimated number of clusters is chosen to be the first integer that the former change is no more significantly larger than the latter change. Simulation studies are carried out to examine the accuracy of the proposed method. We also give an example of real data analysis in the paper.
Keywords :
category theory; data analysis; maximum likelihood estimation; pattern clustering; random processes; Hamming distances; categorical data analysis; cluster analysis; maximum likelihood estimation; maximum log likelihood function; random variable; Approximation methods; Clustering algorithms; Data models; Educational institutions; Hamming distance; Mathematical model; Maximum likelihood estimation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Computation (ICNC), 2011 Seventh International Conference on
Conference_Location :
Shanghai
ISSN :
2157-9555
Print_ISBN :
978-1-4244-9950-2
Type :
conf
DOI :
10.1109/ICNC.2011.6022590
Filename :
6022590
Link To Document :
بازگشت