DocumentCode :
3403882
Title :
Considering Data Skew in Multiway Joins for MapReduce
Author :
Lei Wu ; Changchun Zhang ; Haiyan Meng ; Jing Li
Author_Institution :
Sch. of Comput. Sci. & Technol., Univ. of Sci. & Technol. of China, Hefei, China
fYear :
2013
fDate :
22-23 Aug. 2013
Firstpage :
69
Lastpage :
73
Abstract :
Data analyzing and processing are important tasks in cloud computing. The MapReduce can provide a cost-effective, flexible, fault-tolerant and scalable distributed programming model over large clusters. However, how to implement join operation using MapReduce efficiently is an attractive point. Data skew problem has a strong impact on the performance of join operation. In this paper, we implement the range partition method based on the way of sampling, and apply it to multi-way joins to avoid the influence of data skew. The results of the experiments we have conducted show that our approach is more efficient than current algorithms.
Keywords :
cloud computing; data analysis; distributed programming; fault tolerant computing; performance evaluation; sampling methods; MapReduce; cloud computing; cost-effective flexible distributed programming model; data analysis; data processing; data skew problem; fault-tolerant distributed programming model; multiway join operation; scalable distributed programming model; Arrays; Cloud computing; Data processing; Distributed databases; Educational institutions; Partitioning algorithms; Query processing; Data skew; MapReduce; Multi-way joins;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
ChinaGrid Annual Conference (ChinaGrid), 2013 8th
Conference_Location :
Changchun
Print_ISBN :
978-0-7695-5058-9
Type :
conf
DOI :
10.1109/ChinaGrid.2013.8
Filename :
6623869
Link To Document :
بازگشت