Title :
Combining K-means and semivariogram-based grid clustering
Author :
Trujillo, Maria ; Izquierdo, Ebroul
Author_Institution :
Dept. of Electron. Eng., London Univ.
Abstract :
Clustering is useful in several situations, amongst others: data mining, information retrieval, image segmentation, and data classification. In this paper an approach for grouping data sets that are indexed in the space is proposed. It is based on the k-means algorithm and grid clustering. The former is the simplest and most commonly used clustering technique. A major problem with this algorithm is that it is sensitive to the selection of the initial partition. The latter is commonly used for grouping data that are indexed in the space. The goal in this paper is to overcome the high sensitivity of the k-means algorithm to the starting conditions by using the available spatial information. A semivariogram-based grid clustering is introduced. It uses the spatial correlation for determining the bin size. Since the bins are constrained to regular blocks while the spatial distribution of objects is not regular, we propose to combine this technique with a conventional k-means algorithm. By using the semivariogram an excellent initialization of the k-means is provided. Experimental results show that the final partition preserves the spatial distribution of the objects
Keywords :
pattern clustering; statistical analysis; bin size; data set grouping; k-means; semivariogram-based grid clustering; spatial correlation; spatial distribution; Clustering algorithms; Data analysis; Data engineering; Data mining; Electronic mail; Image retrieval; Image segmentation; Information retrieval; Partitioning algorithms; Pattern classification;
Conference_Titel :
ELMAR, 2005. 47th International Symposium
Conference_Location :
Zadar
Print_ISBN :
953-7044-01-4
DOI :
10.1109/ELMAR.2005.193628