• DocumentCode
    3455538
  • Title

    Finding the Optimal Number of Clusters from Artificial Datasets

  • Author

    Päivinen, Niina ; Grönfors, Tapio

  • Author_Institution
    Dept. of Comput. Sci., Kuopio Univ., Kuopio
  • fYear
    2006
  • fDate
    20-22 Aug. 2006
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    This study deals with the problem of selecting the right number of clusters. Scale-free minimum spanning trees (SFMSTs) were constructed from the artificial test datasets, and the number of clusters, based on the distribution of the edge lengths, as well as the clustering itself was obtained from the structure. As a reference, the nearest neighbor and k-means clustering methods were used, and the number of clusters was determined with the largest average silhouette width criterium. The SFMST clustering mehtod proved to be a method which is able to automatically find the optimal number of clusters from the dataset without using any user-defined parameters.
  • Keywords
    pattern clustering; statistical distributions; trees (mathematics); artificial dataset; edge length distribution; k-means clustering method; largest average silhouette width criterium; nearest neighbor clustering method; optimal cluster selection problem; probability distribution; scale-free minimum spanning tree; Bridges; Clustering methods; Computer science; Data analysis; Histograms; Joining processes; Nearest neighbor searches; Probability distribution; Testing; Tree graphs;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Cybernetics, 2006. ICCC 2006. IEEE International Conference on
  • Conference_Location
    Budapest
  • Print_ISBN
    1-4244-0071-6
  • Electronic_ISBN
    1-4244-0072-4
  • Type

    conf

  • DOI
    10.1109/ICCCYB.2006.305691
  • Filename
    4097652