DocumentCode
3350319
Title
Estimation of number of clusters in categorical data via distance-based likelihood function
Author
Peng Zhang ; Yaolong Feng ; Xiaogang Wang
Author_Institution
Dept. of Math. & Stat. Sci., Univ. of Alberta, Edmonton, AB, Canada
Volume
4
fYear
2011
fDate
26-28 July 2011
Firstpage
2396
Lastpage
2400
Abstract
We propose a new approach to selecting the number of clusters for categorical data via the likelihood function based on Hamming distances. Properties of the random variable of the distance of categorical data and the maximum likelihood estimators are discussed. An expected maximized log-likelihood function on data of a unique cluster is computed using simulated data. Changes in the maximized log-likelihood functions with respect to different numbers of clusters are compared with the thresholds obtained from the expected counterparts. The estimated number of clusters is chosen to be the first integer that the former change is no more significantly larger than the latter change. Simulation studies are carried out to examine the accuracy of the proposed method. We also give an example of real data analysis in the paper.
Keywords
category theory; data analysis; maximum likelihood estimation; pattern clustering; random processes; Hamming distances; categorical data analysis; cluster analysis; maximum likelihood estimation; maximum log likelihood function; random variable; Approximation methods; Clustering algorithms; Data models; Educational institutions; Hamming distance; Mathematical model; Maximum likelihood estimation;
fLanguage
English
Publisher
ieee
Conference_Titel
Natural Computation (ICNC), 2011 Seventh International Conference on
Conference_Location
Shanghai
ISSN
2157-9555
Print_ISBN
978-1-4244-9950-2
Type
conf
DOI
10.1109/ICNC.2011.6022590
Filename
6022590
Link To Document