DocumentCode
77683
Title
From the Cloud to the Atmosphere: Running MapReduce across Data Centers
Author
Jayalath, Chamikara ; Stephen, Jose ; Eugster, Patrick
Author_Institution
Dept. of Comput. Sci., Purdue Univ., West Lafayette, IN, USA
Volume
63
Issue
1
fYear
2014
fDate
Jan. 2014
Firstpage
74
Lastpage
87
Abstract
Efficiently analyzing big data is a major issue in our current era. Examples of analysis tasks include identification or detection of global weather patterns, economic changes, social phenomena, or epidemics. The cloud computing paradigm along with software tools such as implementations of the popular MapReduce framework offer a response to the problem by distributing computations among large sets of nodes. In many scenarios, input data are, however, geographically distributed (geodistributed) across data centers, and straightforwardly moving all data to a single data center before processing it can be prohibitively expensive. Above-mentioned tools are designed to work within a single cluster or data center and perform poorly or not at all when deployed across data centers. This paper deals with executing sequences of MapReduce jobs on geo-distributed data sets. We analyze possible ways of executing such jobs, and propose data transformation graphs that can be used to determine schedules for job sequences which are optimized either with respect to execution time or monetary cost. We introduce G-MR, a system for executing such job sequences, which implements our optimization framework. We present empirical evidence in Amazon EC2 and VICCI of the benefits of G-MR over common, naïve deployments for processing geodistributed data sets. Our evaluations show that using G-MR significantly improves processing time and cost for geodistributed data sets.
Keywords
cloud computing; computer centres; data analysis; graph theory; optimisation; software tools; Amazon EC2; G-MR; MapReduce; VICCI; big data analysis; cloud computing paradigm; data centers; data transformation graphs; economic changes; epidemics; geo-distributed data sets; global weather patterns; job sequences; optimization framework; social phenomena; software tools; Geodistributed; MapReduce; big data; data center;
fLanguage
English
Journal_Title
Computers, IEEE Transactions on
Publisher
ieee
ISSN
0018-9340
Type
jour
DOI
10.1109/TC.2013.121
Filename
6520848
Link To Document