DocumentCode :
104498
Title :
Virtual Shuffling for Efficient Data Movement in MapReduce
Author :
Weikuan Yu ; Yandong Wang ; Xinyu Que ; Cong Xu
Author_Institution :
Dept. of Comput. Sci. & Software Eng., Auburn Univ., Auburn, AL, USA
Volume :
64
Issue :
2
fYear :
2015
fDate :
Feb. 2015
Firstpage :
556
Lastpage :
568
Abstract :
MapReduce is a popular parallel processing framework for large-scale data analytics. To keep up with the increasing volume of datasets, it requires efficient I/O capability from the underlying computer systems to process and analyze data in two phases (mapping and reducing). Between these phases, MapReduce requires a shuffling phase to globally exchange the intermediate data generated by the mapping phase. We reveal that data shuffling, by physically moving segments of intermediate data across disks, causes significant I/O contention and compounds the I/O problem. In this paper, we propose a novel virtual shuffling strategy to enable efficient data movement and reduce I/O for MapReduce shuffling, thereby reducing power consumption and conserving energy. Virtual shuffling is realized through a combination of three techniques including a three-level segment table, near-demand merging, and dynamic and balanced merging subtrees. Our experimental results show that virtual shuffling significantly speeds up data movement in MapReduce and achieves faster job execution. Particularly, its reduction in disk I/O accesses results in as much as 12% savings in power consumption for MapReduce programs.
Keywords :
merging; parallel programming; power aware computing; tree data structures; I/O capability; I/O contention; MapReduce programs; computer systems; data analysis; data movement; data processing; disk I/O access; dynamic-balanced merging subtrees; energy conservation; global intermediate data exchange; intermediate data segments; job execution; large-scale data analytics; mapping phase; near-demand merging; parallel processing framework; power consumption reduction; reducing phase; three-level segment table; virtual shuffling phase; Computational modeling; Data models; Information management; Merging; Power demand; Tuning; Hadoop; MapReduce; near-demand merging; virtual shuffling;
fLanguage :
English
Journal_Title :
Computers, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9340
Type :
jour
DOI :
10.1109/TC.2013.216
Filename :
6671574
Link To Document :
بازگشت