Large-scale document similarity computation based on cloud computing platform

Author

He, Chaobo ; Tang, Yong ; Tang, Feiyi ; Yang, Atiao

Author_Institution

Dept. of Comput., South China Normal Univ., Guangzhou, China

fYear

2011

fDate

26-28 Oct. 2011

Firstpage

175

Lastpage

179

Abstract

Low efficiency existing in the current approaches for large scale document similarity computation, to make an improvement we pinpointed a new approach based on cloud computing platform in this paper. The approach carried out document similarity computation based on traditional vector model space as well as applied MapReduce computation model to realize the parallelization of distributed inverted index and similarity computation. In this paper we first discussed the traditional approaches´ disadvantages, and then presented the structure of distributed inverted index, the architecture of cloud computing platform and the core algorithms based on MapReduce computation model. Last we made some related experiments. Using this approach, large scale document similarity computation can be run more effectively and had more scalability as well.

Keywords

cloud computing; document handling; MapReduce computation model; cloud computing platform; distributed inverted index; large scale document similarity computation; vector model space; Computational modeling; DSL; Indexes; Monitoring; cloud computing; distributed inverted index; document similarity; map-reduce; vector space model;

fLanguage

English

Publisher

ieee

Conference_Titel

Pervasive Computing and Applications (ICPCA), 2011 6th International Conference on

Conference_Location

Port Elizabeth

Print_ISBN

978-1-4577-0209-9

Type

conf

DOI

10.1109/ICPCA.2011.6106499

Filename

6106499