مرکز منطقه ای اطلاع رساني علوم و فناوري - Multi-file queries performance improvement through data placement in Hadoop

DocumentCode :

2301268

Title :

Multi-file queries performance improvement through data placement in Hadoop

Author :

Yu Tang ; Abdulhay, E. ; Aihua Fan ; Sheng Su ; Gebreselassie, K.

Author_Institution :

Univ. of Electron. Sci. & Technol. of China, Chengdu, China

fYear :

2012

fDate :

29-31 Dec. 2012

Firstpage :

986

Lastpage :

991

Abstract :

Hadoop is enjoying popularity for processing data-intensive jobs because of its data locality feature. However, the performance gained from Hadoop´s above feature is currently limited by its default block placement policy, which implicitly assumes instances of MapReduce jobs access data from a single file. On the contrary, multi-file queries like indexing query or aggregation query need to process related data from more than one files found on different DataNodes of a cluster. In this paper we proposed a Correlation-based Block Placement (CBP) Algorithm that enhances the performance of these queries by placing related blocks on the same set of DataNodes. Furthermore, we developed a customized InputFormat that enables InputSplits contain records from different files. Simulation results demonstrated that the number of migrating data blocks for CBP was insignificant. On the contrary, for default policy case, the number of migrating data blocks increased significantly with the input dataset size. As a result, for any input dataset size, the performance of CBP exceeded that of the default policy.

Keywords :

distributed processing; query processing; CBP; Hadoop; aggregation query; block placement policy; correlation-based block placement algorithm; data locality feature; data placement; data-intensive job processing; indexing query; multifile queries performance improvement; multifile query; Block Placement; Correlation; Data locality; HDFS;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computer Science and Network Technology (ICCSNT), 2012 2nd International Conference on

Conference_Location :

Changchun

Print_ISBN :

978-1-4673-2963-7

Type :

conf

DOI :

10.1109/ICCSNT.2012.6526092

Filename :

6526092

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2301268