DocumentCode
2664656
Title
Design and Implementation of Parallel Term Contribution Algorithm Based on Mapreduce Model
Author
Peng Chao ; Wu Bin ; Deng Chao
Author_Institution
Beijing Univ. of Posts & Telecommun. BUPT, Beijing, China
fYear
2012
fDate
19-20 June 2012
Firstpage
43
Lastpage
47
Abstract
MapReduce is a software framework introduced by Google in 2004 to support distributed computing on large datasets on clusters of computers[1]. The term contribution (TC) algorithm is a relatively new algorithm in text mining to select features for clustering. In this paper, we design and implement a parallel term contribution (PTC) algorithm based on MapReduce model. By experiment, we come to the conclusion that the performance of TC is greatly enhanced using MapReduce framework.
Keywords
data mining; parallel algorithms; pattern clustering; text analysis; Mapreduce model; PTC algorithm; clustering; computer cluster; distributed computing; parallel term contribution algorithm design; software framework; text mining; Algorithm design and analysis; Clustering algorithms; Computational modeling; Data models; Software algorithms; Text mining; Vectors; Feature Selection; Hadoop; MapReduce; Term Contribution Algorithm; Text Mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Open Cirrus Summit (OCS), 2012 Seventh
Conference_Location
Beijing
Type
conf
DOI
10.1109/OCS.2012.39
Filename
6695839
Link To Document