• DocumentCode
    3433483
  • Title

    Clustering via the Bayesian information criterion with applications in speech recognition

  • Author

    Chen, Scott Shaobing ; Gopalakrishnan, P.S.

  • Author_Institution
    IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
  • Volume
    2
  • fYear
    1998
  • fDate
    12-15 May 1998
  • Firstpage
    645
  • Abstract
    One difficult problem we are often faced with in clustering analysis is how to choose the number of clusters. We propose to choose the number of clusters by optimizing the Bayesian information criterion (BIC), a model selection criterion in the statistics literature. We develop a termination criterion for the hierarchical clustering methods which optimizes the BIC criterion in a greedy fashion. The resulting algorithms are fully automatic. Our experiments on Gaussian mixture modeling and speaker clustering demonstrate that the BIC criterion is able to choose the number of clusters according to the intrinsic complexity present in the data
  • Keywords
    Bayes methods; Gaussian processes; information theory; pattern classification; speech recognition; statistical analysis; Gaussian mixture modeling; automatic algorithms; clustering analysis; data complexity; experiments; greedy Bayesian information criterion; hierarchical clustering methods; model selection; speaker clustering; speech recognition; statistics; termination criterion; Bayesian methods; Clustering algorithms; Clustering methods; Data analysis; Gaussian processes; Hidden Markov models; Merging; Optimization methods; Speech recognition; Statistics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on
  • Conference_Location
    Seattle, WA
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-4428-6
  • Type

    conf

  • DOI
    10.1109/ICASSP.1998.675347
  • Filename
    675347