• DocumentCode
    3426344
  • Title

    An architecture and algorithms for multi-run clustering

  • Author

    Jiamthapthaksin, Rachsuda ; Eick, Christoph F. ; Rinsurongkawong, Vadeerat

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Houston, Houston, TX
  • fYear
    2009
  • fDate
    March 30 2009-April 2 2009
  • Firstpage
    306
  • Lastpage
    313
  • Abstract
    This paper addresses two main challenges for clustering which require extensive human effort: selecting appropriate parameters for an arbitrary clustering algorithm and identifying alternative clusters. We propose an architecture and a concrete system MR-CLEVER for multi-run clustering that integrates active learning with clustering algorithms. The key hypothesis of this work is that better clustering results can be obtained by combining clusters that originate from multiple runs of clustering algorithms. By defining states that represent parameter settings of a clustering algorithm, the proposed architecture actively learns a state utility function. The utility of a parameter setting is assessed based on clustering run-time, quality and novelty of the obtained clusters. Furthermore, the utility function plays an important role in guiding the clustering algorithm to seek novel solutions. Cluster novelty measures are introduced for this purpose. Finally, we also contribute a cluster summarization algorithm that assembles a final clustering as a combination of high-quality clusters originating from multiple runs. Merits of our proposed system are that it is generic and therefore can be used in conjunction with different clustering algorithms, and it reduces human effort for selecting the parameters, for comparing clustering results and for assembling clustering results. We evaluate the proposed system in conjunction with a representative based clustering algorithm namely CLEVER for a challenging data mining task involving an earthquake dataset. The obtained results demonstrate that, in comparison to the best single-run clustering, multi-run clustering discovers solutions of higher quality.
  • Keywords
    data mining; pattern clustering; arbitrary clustering algorithm; cluster summarization algorithm; data mining task; multi-run clustering algorithm; state utility function; Aggregates; Algorithm design and analysis; Assembly systems; Clustering algorithms; Concrete; Data mining; Earthquakes; Humans; Pollution measurement; Runtime;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence and Data Mining, 2009. CIDM '09. IEEE Symposium on
  • Conference_Location
    Nashville, TN
  • Print_ISBN
    978-1-4244-2765-9
  • Type

    conf

  • DOI
    10.1109/CIDM.2009.4938664
  • Filename
    4938664