• DocumentCode
    3350319
  • Title

    Estimation of number of clusters in categorical data via distance-based likelihood function

  • Author

    Peng Zhang ; Yaolong Feng ; Xiaogang Wang

  • Author_Institution
    Dept. of Math. & Stat. Sci., Univ. of Alberta, Edmonton, AB, Canada
  • Volume
    4
  • fYear
    2011
  • fDate
    26-28 July 2011
  • Firstpage
    2396
  • Lastpage
    2400
  • Abstract
    We propose a new approach to selecting the number of clusters for categorical data via the likelihood function based on Hamming distances. Properties of the random variable of the distance of categorical data and the maximum likelihood estimators are discussed. An expected maximized log-likelihood function on data of a unique cluster is computed using simulated data. Changes in the maximized log-likelihood functions with respect to different numbers of clusters are compared with the thresholds obtained from the expected counterparts. The estimated number of clusters is chosen to be the first integer that the former change is no more significantly larger than the latter change. Simulation studies are carried out to examine the accuracy of the proposed method. We also give an example of real data analysis in the paper.
  • Keywords
    category theory; data analysis; maximum likelihood estimation; pattern clustering; random processes; Hamming distances; categorical data analysis; cluster analysis; maximum likelihood estimation; maximum log likelihood function; random variable; Approximation methods; Clustering algorithms; Data models; Educational institutions; Hamming distance; Mathematical model; Maximum likelihood estimation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Computation (ICNC), 2011 Seventh International Conference on
  • Conference_Location
    Shanghai
  • ISSN
    2157-9555
  • Print_ISBN
    978-1-4244-9950-2
  • Type

    conf

  • DOI
    10.1109/ICNC.2011.6022590
  • Filename
    6022590