• DocumentCode
    2664656
  • Title

    Design and Implementation of Parallel Term Contribution Algorithm Based on Mapreduce Model

  • Author

    Peng Chao ; Wu Bin ; Deng Chao

  • Author_Institution
    Beijing Univ. of Posts & Telecommun. BUPT, Beijing, China
  • fYear
    2012
  • fDate
    19-20 June 2012
  • Firstpage
    43
  • Lastpage
    47
  • Abstract
    MapReduce is a software framework introduced by Google in 2004 to support distributed computing on large datasets on clusters of computers[1]. The term contribution (TC) algorithm is a relatively new algorithm in text mining to select features for clustering. In this paper, we design and implement a parallel term contribution (PTC) algorithm based on MapReduce model. By experiment, we come to the conclusion that the performance of TC is greatly enhanced using MapReduce framework.
  • Keywords
    data mining; parallel algorithms; pattern clustering; text analysis; Mapreduce model; PTC algorithm; clustering; computer cluster; distributed computing; parallel term contribution algorithm design; software framework; text mining; Algorithm design and analysis; Clustering algorithms; Computational modeling; Data models; Software algorithms; Text mining; Vectors; Feature Selection; Hadoop; MapReduce; Term Contribution Algorithm; Text Mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Open Cirrus Summit (OCS), 2012 Seventh
  • Conference_Location
    Beijing
  • Type

    conf

  • DOI
    10.1109/OCS.2012.39
  • Filename
    6695839