Title :
Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms
Author :
Salvador, Stan ; Chan, Philip
Author_Institution :
Dept. of Comput. Sci., Florida Inst. of Technol., Melbourne, FL, USA
Abstract :
Many clustering and segmentation algorithms both suffer from the limitation that the number of clusters/segments is specified by a human user. It is often impractical to expect a human with sufficient domain knowledge to be available to select the number of clusters/segments to return. We investigate techniques to determine the number of clusters or segments to return from hierarchical clustering and segmentation algorithms. We propose an efficient algorithm, the L method that finds the "knee" in a \´# of clusters vs. clustering evaluation metric\´ graph. Using the knee is well-known, but is not a particularly well-understood method to determine the number of clusters. We explore the feasibility of this method, and attempt to determine in which situations it will and will not work. We also compare the L method to existing methods based on the accuracy of the number of clusters that are determined and efficiency. Our results show favorable performance for these criteria compared to the existing methods that were evaluated.
Keywords :
database management systems; pattern clustering; unsupervised learning; hierarchical clustering algorithm; hierarchical segmentation algorithm; unsupervised learning; Clustering algorithms; Error correction; Humans; Knee; Machine learning algorithms; Multidimensional systems; Runtime; Statistical analysis; Testing; Unsupervised learning;
Conference_Titel :
Tools with Artificial Intelligence, 2004. ICTAI 2004. 16th IEEE International Conference on
Print_ISBN :
0-7695-2236-X
DOI :
10.1109/ICTAI.2004.50