Distributed index mechanism based on Hadoop

Author

Qin Liu ; Ni Zhang ; Xiaowen Yang ; Hongming Zhu

Author_Institution

Sch. of software Eng., Tongji Univ., Shanghai, China

fYear

2014

fDate

Sept. 29 2014-Oct. 1 2014

Firstpage

1

Lastpage

7

Abstract

Recent years, MapReduce has aroused much attention. However, MapReduce has its own weakness- require an entire block scan as it cannot precisely locate the query result. Currently, there are already some researches that have built index on Hadoop, but some of them could only deal with full-text search, which cannot support dataset with certain schema. There´s not yet a general distributed unstructured data index system optimized from MapReduce that could handle multi-schema dataset and support query well no matter with index or without index. So in this paper, we proposed a distributed index mechanism and set up this index mechinism on MapReduce which can reduce its query time and map task number in some context. Moreover, this distributed index mechanism could support multi-schema dataset, has a good scalability and is customizable. From our experiment, we find our distributed index mechanism could save up to 30% query time, and 90% map task number in some context compared to the query performance of original MapReduce framework, and the advantage grows as the dataset expands.

Keywords

data handling; indexing; parallel processing; query processing; Hadoop; MapReduce framework; distributed index mechanism; distributed unstructured data index system; full-text search; multischema dataset; query performance; Indexes; MapReduce; hadoop; index; schema;

fLanguage

English

Publisher

ieee

Conference_Titel

Electronics, Computer and Computation (ICECCO), 2014 11th International Conference on

Conference_Location

Abuja

Print_ISBN

978-1-4799-4108-7

Type

conf

DOI

10.1109/ICECCO.2014.6997544

Filename

6997544