• DocumentCode
    2696218
  • Title

    Large-scale document similarity computation based on cloud computing platform

  • Author

    He, Chaobo ; Tang, Yong ; Tang, Feiyi ; Yang, Atiao

  • Author_Institution
    Dept. of Comput., South China Normal Univ., Guangzhou, China
  • fYear
    2011
  • fDate
    26-28 Oct. 2011
  • Firstpage
    175
  • Lastpage
    179
  • Abstract
    Low efficiency existing in the current approaches for large scale document similarity computation, to make an improvement we pinpointed a new approach based on cloud computing platform in this paper. The approach carried out document similarity computation based on traditional vector model space as well as applied MapReduce computation model to realize the parallelization of distributed inverted index and similarity computation. In this paper we first discussed the traditional approaches´ disadvantages, and then presented the structure of distributed inverted index, the architecture of cloud computing platform and the core algorithms based on MapReduce computation model. Last we made some related experiments. Using this approach, large scale document similarity computation can be run more effectively and had more scalability as well.
  • Keywords
    cloud computing; document handling; MapReduce computation model; cloud computing platform; distributed inverted index; large scale document similarity computation; vector model space; Computational modeling; DSL; Indexes; Monitoring; cloud computing; distributed inverted index; document similarity; map-reduce; vector space model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pervasive Computing and Applications (ICPCA), 2011 6th International Conference on
  • Conference_Location
    Port Elizabeth
  • Print_ISBN
    978-1-4577-0209-9
  • Type

    conf

  • DOI
    10.1109/ICPCA.2011.6106499
  • Filename
    6106499