DocumentCode
3199354
Title
Grouping Blocks for MapReduce Co-Locality
Author
Xiao Yu ; Bo Hong
Author_Institution
Georgia Inst. of Technol., Atlanta, GA, USA
fYear
2015
fDate
25-29 May 2015
Firstpage
271
Lastpage
280
Abstract
Avoiding off-switch communication is critical in enhancing the performance of MapReduce/Hadoop cluster. Current efforts in Hadoop only focus on minimizing off-switch for map tasks, and yet reduce tasks shuffle data across the whole cluster because file blocks (hence map tasks) are scattered. In this paper, we argue that grouping blocks in a few racks can greatly decrease the amount of off-switch data exchange and therefore shorten the execution time of jobs in the cluster. We proposed mechanisms to place data in a grouped fashion and to schedule tasks accordingly. We explored the trade-off between the improvement on off-switch communication and loss of parallelism, we discussed methods to mitigate the loss of parallelism issue. Extensive experiments show that our method can significantly avoid off-switch communication, and in result decrease job execution time by up to 56%.
Keywords
electronic data interchange; parallel processing; pattern clustering; scheduling; Hadoop cluster; MapReduce cluster; MapReduce co-locality; job execution time; map task; off-switch communication; off-switch data exchange; task scheduling; task shuffle data; Bandwidth; History; Parallel processing; Runtime; Schedules; Scheduling; Switches; Group Data Blocks; Map/Reduce Co-locality; MapReduce/Hadoop;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International
Conference_Location
Hyderabad
ISSN
1530-2075
Type
conf
DOI
10.1109/IPDPS.2015.16
Filename
7161516
Link To Document