DocumentCode
2696218
Title
Large-scale document similarity computation based on cloud computing platform
Author
He, Chaobo ; Tang, Yong ; Tang, Feiyi ; Yang, Atiao
Author_Institution
Dept. of Comput., South China Normal Univ., Guangzhou, China
fYear
2011
fDate
26-28 Oct. 2011
Firstpage
175
Lastpage
179
Abstract
Low efficiency existing in the current approaches for large scale document similarity computation, to make an improvement we pinpointed a new approach based on cloud computing platform in this paper. The approach carried out document similarity computation based on traditional vector model space as well as applied MapReduce computation model to realize the parallelization of distributed inverted index and similarity computation. In this paper we first discussed the traditional approaches´ disadvantages, and then presented the structure of distributed inverted index, the architecture of cloud computing platform and the core algorithms based on MapReduce computation model. Last we made some related experiments. Using this approach, large scale document similarity computation can be run more effectively and had more scalability as well.
Keywords
cloud computing; document handling; MapReduce computation model; cloud computing platform; distributed inverted index; large scale document similarity computation; vector model space; Computational modeling; DSL; Indexes; Monitoring; cloud computing; distributed inverted index; document similarity; map-reduce; vector space model;
fLanguage
English
Publisher
ieee
Conference_Titel
Pervasive Computing and Applications (ICPCA), 2011 6th International Conference on
Conference_Location
Port Elizabeth
Print_ISBN
978-1-4577-0209-9
Type
conf
DOI
10.1109/ICPCA.2011.6106499
Filename
6106499
Link To Document