Title :
Research and application of MapReduce-based MST text clustering algorithm
Author :
Yang, Kehua ; He, Guoxiong ; He, Guohui
Author_Institution :
Hunan Univ., Changsha, China
Abstract :
In view of today´s unprecedented diverse and discrete mass text data processing, this paper presents a distributed MST (minimum spanning tree) algorithm based on MapReduce programming model. And with this MST algorithm, a distributed MST text clustering algorithm is designed and implemented. In this paper, this clustering algorithm is analyzed in three aspects: text feature vector extraction, graph construction and MST construction. And some data was used to experimentally compare this algorithm with ordinary MST clustering algorithm and MapReduce-based K-means clustering algorithm.
Keywords :
pattern clustering; text analysis; trees (mathematics); MST construction; MapReduce programming model; MapReduce-based K-means clustering; MapReduce-based MST text clustering; distributed MST text clustering; graph construction; mass text data processing; minimum spanning tree; text feature vector extraction; Algorithm design and analysis; Clustering algorithms; Computational complexity; Feature extraction; Helium; Partitioning algorithms; Vectors;
Conference_Titel :
Information Science and Technology (ICIST), 2012 International Conference on
Conference_Location :
Hubei
Print_ISBN :
978-1-4577-0343-0
DOI :
10.1109/ICIST.2012.6221748