DocumentCode
2397558
Title
Research and application of distributed parallel search hadoop algorithm
Author
AiLing Duan
Author_Institution
Sch. of Inf. Sci. & Eng., Henan Univ. of Technol., Zhengzhou, China
fYear
2012
fDate
19-20 May 2012
Firstpage
2462
Lastpage
2465
Abstract
Hadoop is an open source distributed parallel computing platform, which is mainly composed of MapReduce algorithm and a distributed file system. This paper introduces Hadoop and the related technologies, discusses in detail the idea and basic framework of MapReduce algorithm, together with the parallelization method and feasibility regarding the massive data involved in Internet search The paper also puts forward the idea and strategy to use MapReduce for parallel processing of webpage inverted index.
Keywords
Web services; file organisation; information retrieval; parallel algorithms; public domain software; search problems; Hadoop; Internet search; MapReduce algorithm; Web page inverted index; distributed file system; distributed parallel algorithm; open source computing; parallel processing; Distributed databases; Educational institutions; File systems; Indexes; Internet; Parallel processing; Servers; Hadoop; MapReduce algorithm; inverted index; parallel computing;
fLanguage
English
Publisher
ieee
Conference_Titel
Systems and Informatics (ICSAI), 2012 International Conference on
Conference_Location
Yantai
Print_ISBN
978-1-4673-0198-5
Type
conf
DOI
10.1109/ICSAI.2012.6223552
Filename
6223552
Link To Document