• DocumentCode
    399777
  • Title

    Model-based clustering with soft balancing

  • Author

    Zhong, Shi ; Ghosh, Joydeep

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Florida Atlantic Univ., Boca Raton, FL, USA
  • fYear
    2003
  • fDate
    19-22 Nov. 2003
  • Firstpage
    459
  • Lastpage
    466
  • Abstract
    Balanced clustering algorithms can be useful in a variety of applications and have recently attracted increasing research interest. Most recent work, however, addressed only hard balancing by constraining each cluster to have equal or a certain minimum number of data objects. We provide a soft balancing strategy built upon a soft mixture-of-models clustering framework. This strategy constrains the sum of posterior probabilities of object membership for each cluster to be equal and thus balances the expected number of data objects in each cluster. We first derive soft model-based clustering from an information-theoretic viewpoint and then show that the proposed balanced clustering can be parameterized by a temperature parameter that controls the softness of clustering as well as that of balancing. As the temperature decreases, the resulting partitioning becomes more and more balanced. In the limit, when temperature becomes zero, the balancing becomes hard and the actual partitioning becomes perfectly balanced. The effectiveness of the proposed soft balanced clustering algorithm is demonstrated on both synthetic and real text data.
  • Keywords
    computational complexity; data mining; maximum likelihood estimation; pattern clustering; probability; cluster constraints; data objects; graph partitioning; hard balancing; information-theoretic viewpoint; object membership; posterior probabilities; soft balanced clustering algorithms; soft model-based clustering; temperature parameter; Application software; Clustering algorithms; Clustering methods; Computer science; Data mining; Databases; Indexing; Maximum likelihood estimation; Partitioning algorithms; Temperature control;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2003. ICDM 2003. Third IEEE International Conference on
  • Print_ISBN
    0-7695-1978-4
  • Type

    conf

  • DOI
    10.1109/ICDM.2003.1250953
  • Filename
    1250953