DocumentCode :
2544172
Title :
Parallel Text Clustering Based on MapReduce
Author :
Cao Zewen ; Zhou Yao
Author_Institution :
Coll. of Inf. Syst. &Manage., Nat. Univ. of Defense Technol., Changsha, China
fYear :
2012
fDate :
1-3 Nov. 2012
Firstpage :
226
Lastpage :
229
Abstract :
This paper analyzes challenges of ordinary text clustering algorithms and proposes cloud computing can be a feasible solution. The classical Jarvis-Patrick (JP) algorithm was adapted as a study case. It was implemented using MapReduce programming mode and was testified on the cloud computing platform-Hadoop with Sogou corpus provided by Sogou laboratory. The experiment results demonstrate that text clustering algorithm can be paralleled in MapReduce framework and parallel algorithm can handle massive textual data and get a better time performance.
Keywords :
cloud computing; parallel algorithms; pattern clustering; text analysis; Hadoop; JP algorithm; Jarvis-Patrick algorithm; MapReduce programming mode; Sogou corpus; Sogou laboratory; cloud computing platform; parallel algorithm; parallel text clustering; textual data; Algorithm design and analysis; Cloud computing; Clustering algorithms; Computational modeling; Indexes; Sparse matrices; Vectors; Cloud Computing; MapReduce; Text Clustering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cloud and Green Computing (CGC), 2012 Second International Conference on
Conference_Location :
Xiangtan
Print_ISBN :
978-1-4673-3027-5
Type :
conf
DOI :
10.1109/CGC.2012.128
Filename :
6382822
Link To Document :
بازگشت