DocumentCode
2730490
Title
A data locality optimization algorithm for large-scale data processing in Hadoop
Author
Zhao, Yanrong ; Wang, Weiping ; Meng, Dan ; Yang, Xiufeng ; Zhang, Shubin ; Li, Jun ; Guan, Gang
Author_Institution
Inst. of Comput. Technol., Grad. Univ., Beijing, China
fYear
2012
fDate
1-4 July 2012
Abstract
Data-intensive applications are increasingly designed to execute on large computing clusters. Our previous observation on Tencent production systems has indicated that join query is one of the most important queries in large-scale data processing. When running a join query on Hive system, the job of the join query is divided into map phase and reduce phase, and requires transferring large amounts of intermediate results over the network, which is inefficient. In this paper, we proposed an algorithm called CHMJ, the general idea of the algorithm is to take advantage of data locality to accelerate calculation. It includes four parts, Data distribution strategy, Parallel HashMapJoin Algorithm, CoLocation Scheduling and Delay scheduling strategy. CHMJ has been adopted in Tencent data warehouse, and plays an important role in Tencent´s daily operations. Our relevant experiments demonstrate the feasibility and efficiency of our solution.
Keywords
data handling; data warehouses; parallel processing; portals; query processing; scheduling; CHMJ algorithm; Hadoop; Hive system; Internet service portal; Tencent daily operation; Tencent data warehouse; Tencent production system; colocation scheduling; computing cluster; data distribution strategy; data locality optimization algorithm; data-intensive application; delay scheduling strategy; join query; large-scale data processing; map phase; parallel hashmapjoin algorithm; reduce phase; Algorithm design and analysis; Clustering algorithms; Data processing; Delay; Partitioning algorithms; Query processing; Scheduling; Hadoop; MapReduce; join query;
fLanguage
English
Publisher
ieee
Conference_Titel
Computers and Communications (ISCC), 2012 IEEE Symposium on
Conference_Location
Cappadocia
ISSN
1530-1346
Print_ISBN
978-1-4673-2712-1
Electronic_ISBN
1530-1346
Type
conf
DOI
10.1109/ISCC.2012.6249372
Filename
6249372
Link To Document