• DocumentCode
    1415404
  • Title

    A Link-Based Cluster Ensemble Approach for Categorical Data Clustering

  • Author

    Iam-On, Natthakan ; Boongeon, T. ; Garrett, Simon ; Price, Chris

  • Author_Institution
    Sch. of Inf. Technol., Mae Fah Luang Univ., Chiang Rai, Thailand
  • Volume
    24
  • Issue
    3
  • fYear
    2012
  • fDate
    3/1/2012 12:00:00 AM
  • Firstpage
    413
  • Lastpage
    425
  • Abstract
    Although attempts have been made to solve the problem of clustering categorical data via cluster ensembles, with the results being competitive to conventional algorithms, it is observed that these techniques unfortunately generate a final data partition based on incomplete information. The underlying ensemble-information matrix presents only cluster-data point relations, with many entries being left unknown. The paper presents an analysis that suggests this problem degrades the quality of the clustering result, and it presents a new link-based approach, which improves the conventional matrix by discovering unknown entries through similarity between clusters in an ensemble. In particular, an efficient link-based algorithm is proposed for the underlying similarity assessment. Afterward, to obtain the final clustering result, a graph partitioning technique is applied to a weighted bipartite graph that is formulated from the refined matrix. Experimental results on multiple real data sets suggest that the proposed link-based method almost always outperforms both conventional clustering algorithms for categorical data and well-known cluster ensemble techniques.
  • Keywords
    data mining; graph theory; matrix algebra; pattern clustering; categorical data clustering; clustering result quality; data partition; ensemble-information matrix; graph partitioning technique; link-based cluster ensemble approach; underlying similarity assessment; weighted bipartite graph; Algorithm design and analysis; Atmospheric measurements; Clustering algorithms; Current measurement; Data mining; Partitioning algorithms; Transforms; Clustering; categorical data; cluster ensembles; data mining.; link-based similarity;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2010.268
  • Filename
    5677529