Title :
Heuristic based approach to clustering and its time critical applications
Author :
Chen, Alan Chia-Lung ; Gao, Shang ; Alhajj, Reda ; Karampelas, Panagiotis
Author_Institution :
Dept of Comput. Sci., Univ. of Calgary, Calgary, AB, Canada
Abstract :
Clustering may be named as the first clustering technique addressed by the research community since 1960s. However, as databases continue to grow in size, numerous research studies have been undertaken to develop more efficient clustering algorithms and to improve the performance of existing ones. This paper demonstrates a general optimization technique applicable to clustering algorithms with a need to calculate distances and check them against some minimum distance condition. The optimization technique is a simple calculation that finds the minimum possible distance between two points, and checks this distance against the minimum distance condition; thus reusing already computed values and reducing the need to compute a more complicated distance function periodically. The proposed optimization technique has been applied to the agglomerative hierarchical clustering, k-means clustering, and DBSCAN algorithms with successful results. Runtimes for all three algorithms with this optimization scenario were reduced, and the clusters they returned were verified to remain the same as the original algorithms. The optimization technique also shows potential for reducing runtimes by a substantial amount for large databases. As well, the optimization technique shows potential for reducing runtimes more and more as databases grow larger and larger.
Keywords :
optimisation; pattern clustering; DBSCAN algorithms; agglomerative hierarchical clustering; clustering algorithms; k-means clustering; minimum distance condition; optimization technique; time critical applications; Algorithm design and analysis; Clustering algorithms; Databases; Optimization; Performance evaluation; Runtime; Symmetric matrices; DBSCAN; clustering algorithms; density-based clustering; distance computation; hierarchical clustering; k-means clustering; performance;
Conference_Titel :
Information Reuse and Integration (IRI), 2010 IEEE International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
978-1-4244-8097-5
DOI :
10.1109/IRI.2010.5558969