• DocumentCode
    77683
  • Title

    From the Cloud to the Atmosphere: Running MapReduce across Data Centers

  • Author

    Jayalath, Chamikara ; Stephen, Jose ; Eugster, Patrick

  • Author_Institution
    Dept. of Comput. Sci., Purdue Univ., West Lafayette, IN, USA
  • Volume
    63
  • Issue
    1
  • fYear
    2014
  • fDate
    Jan. 2014
  • Firstpage
    74
  • Lastpage
    87
  • Abstract
    Efficiently analyzing big data is a major issue in our current era. Examples of analysis tasks include identification or detection of global weather patterns, economic changes, social phenomena, or epidemics. The cloud computing paradigm along with software tools such as implementations of the popular MapReduce framework offer a response to the problem by distributing computations among large sets of nodes. In many scenarios, input data are, however, geographically distributed (geodistributed) across data centers, and straightforwardly moving all data to a single data center before processing it can be prohibitively expensive. Above-mentioned tools are designed to work within a single cluster or data center and perform poorly or not at all when deployed across data centers. This paper deals with executing sequences of MapReduce jobs on geo-distributed data sets. We analyze possible ways of executing such jobs, and propose data transformation graphs that can be used to determine schedules for job sequences which are optimized either with respect to execution time or monetary cost. We introduce G-MR, a system for executing such job sequences, which implements our optimization framework. We present empirical evidence in Amazon EC2 and VICCI of the benefits of G-MR over common, naïve deployments for processing geodistributed data sets. Our evaluations show that using G-MR significantly improves processing time and cost for geodistributed data sets.
  • Keywords
    cloud computing; computer centres; data analysis; graph theory; optimisation; software tools; Amazon EC2; G-MR; MapReduce; VICCI; big data analysis; cloud computing paradigm; data centers; data transformation graphs; economic changes; epidemics; geo-distributed data sets; global weather patterns; job sequences; optimization framework; social phenomena; software tools; Geodistributed; MapReduce; big data; data center;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/TC.2013.121
  • Filename
    6520848