Title :
From the Cloud to the Atmosphere: Running MapReduce across Data Centers
Author :
Jayalath, Chamikara ; Stephen, Jose ; Eugster, Patrick
Author_Institution :
Dept. of Comput. Sci., Purdue Univ., West Lafayette, IN, USA
Abstract :
Efficiently analyzing big data is a major issue in our current era. Examples of analysis tasks include identification or detection of global weather patterns, economic changes, social phenomena, or epidemics. The cloud computing paradigm along with software tools such as implementations of the popular MapReduce framework offer a response to the problem by distributing computations among large sets of nodes. In many scenarios, input data are, however, geographically distributed (geodistributed) across data centers, and straightforwardly moving all data to a single data center before processing it can be prohibitively expensive. Above-mentioned tools are designed to work within a single cluster or data center and perform poorly or not at all when deployed across data centers. This paper deals with executing sequences of MapReduce jobs on geo-distributed data sets. We analyze possible ways of executing such jobs, and propose data transformation graphs that can be used to determine schedules for job sequences which are optimized either with respect to execution time or monetary cost. We introduce G-MR, a system for executing such job sequences, which implements our optimization framework. We present empirical evidence in Amazon EC2 and VICCI of the benefits of G-MR over common, naïve deployments for processing geodistributed data sets. Our evaluations show that using G-MR significantly improves processing time and cost for geodistributed data sets.
Keywords :
cloud computing; computer centres; data analysis; graph theory; optimisation; software tools; Amazon EC2; G-MR; MapReduce; VICCI; big data analysis; cloud computing paradigm; data centers; data transformation graphs; economic changes; epidemics; geo-distributed data sets; global weather patterns; job sequences; optimization framework; social phenomena; software tools; Geodistributed; MapReduce; big data; data center;
Journal_Title :
Computers, IEEE Transactions on
DOI :
10.1109/TC.2013.121