Title :
Parallel Text Clustering Based on MapReduce
Author :
Cao Zewen ; Zhou Yao
Author_Institution :
Coll. of Inf. Syst. &Manage., Nat. Univ. of Defense Technol., Changsha, China
Abstract :
This paper analyzes challenges of ordinary text clustering algorithms and proposes cloud computing can be a feasible solution. The classical Jarvis-Patrick (JP) algorithm was adapted as a study case. It was implemented using MapReduce programming mode and was testified on the cloud computing platform-Hadoop with Sogou corpus provided by Sogou laboratory. The experiment results demonstrate that text clustering algorithm can be paralleled in MapReduce framework and parallel algorithm can handle massive textual data and get a better time performance.
Keywords :
cloud computing; parallel algorithms; pattern clustering; text analysis; Hadoop; JP algorithm; Jarvis-Patrick algorithm; MapReduce programming mode; Sogou corpus; Sogou laboratory; cloud computing platform; parallel algorithm; parallel text clustering; textual data; Algorithm design and analysis; Cloud computing; Clustering algorithms; Computational modeling; Indexes; Sparse matrices; Vectors; Cloud Computing; MapReduce; Text Clustering;
Conference_Titel :
Cloud and Green Computing (CGC), 2012 Second International Conference on
Conference_Location :
Xiangtan
Print_ISBN :
978-1-4673-3027-5
DOI :
10.1109/CGC.2012.128