DocumentCode :
243750
Title :
A New Fast Minimum Spanning Tree-Based Clustering Technique
Author :
Xiaochun Wang ; Wang, Xia L. ; Jihua Zhu
Author_Institution :
Sch. of Software Eng., Xi´an Jiaotong Univ., Xi´an, China
fYear :
2014
fDate :
14-14 Dec. 2014
Firstpage :
1053
Lastpage :
1060
Abstract :
Due to its important applications in data mining, many techniques have been developed for clustering. For today´s real-world databases which typically have millions of items with many thousands of fields, resulting in datasets that range in size into terabytes, many traditional clustering techniques have more and more restricted capabilities and novel approaches that are computationally efficient have become more and more popular. In this paper, a new efficient approach to graph-theoretical clustering using a minimum spanning tree representation of a dataset is proposed which consists of two-phases. In the first phase, we modify the standard Prim´s algorithm in such a way that an efficient construction of such a tree can be realized based on k-nearest neighbor search mechanisms, during which a new edge weight is defined to maximize the intra-cluster similarity and minimize the inter-cluster similarity of the data set. In the second phase, based on the intuition that the data points are closer in the same cluster than in different clusters, the longest edges in the minimum spanning tree obtained from the first phase are removed to form clusters as the standard minimum spanning tree-based clustering algorithms do. Experiments on synthetic as well as real data sets have been conducted to show that our proposed approach works well with respect to the state-of-the-art methods.
Keywords :
data mining; learning (artificial intelligence); pattern classification; pattern clustering; search problems; trees (mathematics); Prim algorithm; data mining; data points; datasets; edge weight; graph-theoretical clustering; inter-cluster similarity; intra-cluster similarity; k-nearest neighbor search mechanisms; minimum spanning tree representation; minimum spanning tree-based clustering technique; real-world databases; Algorithm design and analysis; Arrays; Clustering algorithms; Educational institutions; Image edge detection; Partitioning algorithms; Standards; clustering; indexing structure; k-nearest neighbor search; minimum spanning tree;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining Workshop (ICDMW), 2014 IEEE International Conference on
Conference_Location :
Shenzhen
Print_ISBN :
978-1-4799-4275-6
Type :
conf
DOI :
10.1109/ICDMW.2014.139
Filename :
7022713
Link To Document :
بازگشت